PK!ܚ>>)__pycache__/__init__.cpython-36.opt-1.pycnu[3 (6]XY@sdZdZdZdZdZdgZddlZddlZddlZddl Z ddl Z dd l m Z m Z dd lmZdd lmZmZmZmZmZmZmZmZmZmZmZd d kGdddeZeZeZGdddeZGddde Z!Gddde"Z#e$dkr ddlZeej%Z&e'e&j(dS)aHBeautiful Soup Elixir and Tonic "The Screen-Scraper's Friend" http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup uses a pluggable XML or HTML parser to parse a (possibly invalid) document into a tree representation. Beautiful Soup provides methods and Pythonic idioms that make it easy to navigate, search, and modify the parse tree. Beautiful Soup works with Python 2.7 and up. It works better if lxml and/or html5lib is installed. For more than you ever wanted to know about Beautiful Soup, see the documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ z*Leonard Richardson (leonardr@segfault.org)z4.6.3z*Copyright (c) 2004-2018 Leonard RichardsonZMIT BeautifulSoupN)builder_registryParserRejectedMarkup) UnicodeDammit) CDataCommentDEFAULT_OUTPUT_ENCODING DeclarationDoctypeNavigableString PageElementProcessingInstruction ResultSet SoupStrainerTagz`You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work.zuYou need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).cseZdZdZdZddgZdZdZd2d d Zd d Z d dZ e ddZ ddZ ddZddifddZefddZddZddZddZdd Zefd!d"Zd3d#d$Zd4d&d'Zd(d)Zd5d*d+Zd,d-Zd.ed/ffd0d1 ZZS)6ra This class defines the basic interface called by the tree builders. These methods will be called by the parser: reset() feed(markup) The tree builder may call these methods from its feed() implementation: handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node No matter how complicated the underlying parser is, you should be able to build a tree using 'start tag' events, 'end tag' events, 'data' events, and "done with data" events. If you encounter an empty-element tag (aka a self-closing tag, like HTML's
tag), call handle_starttag and then handle_endtag. z [document]ZhtmlZfastz aNo parser was explicitly specified, so I'm using the best available %(markup_type)s parser for this system ("%(parser)s"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line %(line_number)s of the file %(filename)s. To get rid of this warning, pass the additional argument 'features="%(parser)s"' to the BeautifulSoup constructor. Nc" sdkrtjddkr*d=tjddkrBd=tjddkrZd=tjdd krrd =tjd fd d }|p|d d}|p|dd}|rt|trtjdd}tdkrtjj} td| |dkr|} t|tr|g}|dkst|dkr|j }t j |} | dkr@t ddj || }| |jkpZ| |jks|jrld} nd} d} ytjd} Wntk rYnX| r| j}| j}n tj}d}|jd}|r|j}|jd(r|dd)}|rt|||j| d}tj|j|dd||_|j|_|j|_||j_||_t |d rN|j!}nt|d!krt|t"rrd"|kst|trd#|krt|trt#j$j% r|j&d$}n|}d%}yt#j$j'|}Wn$t(k r}zWYdd}~XnX|rt|tr|j&d$}tjd&||j)|xZ|jj*|||d'D]D\|_+|_,|_-|_.|j/y|j0PWnt1k rrYnXq2Wd|_+d|j_dS)*a_Constructor. :param markup: A string or a file-like object representing markup to be parsed. :param features: Desirable features of the parser to be used. This may be the name of a specific parser ("lxml", "lxml-xml", "html.parser", or "html5lib") or it may be the type of markup to be used ("html", "html5", "xml"). It's recommended that you name a specific parser, so that Beautiful Soup gives you the same results across platforms and virtual environments. :param builder: A specific TreeBuilder to use instead of looking one up based on `features`. You shouldn't need to use this. :param parse_only: A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered. This is useful when parsing part of a document that would otherwise be too large to fit into memory. :param from_encoding: A string indicating the encoding of the document to be parsed. Pass this in if Beautiful Soup is guessing wrongly about the document's encoding. :param exclude_encodings: A list of strings indicating encodings known to be wrong. Pass this in if you don't know the document's encoding but you know Beautiful Soup's guess is wrong. :param kwargs: For backwards compatibility purposes, the constructor accepts certain keyword arguments used in Beautiful Soup 3. None of these arguments do anything in Beautiful Soup 4 and there's no need to actually pass keyword arguments into the constructor. ZconvertEntitieszBS4 does not respect the convertEntities argument to the BeautifulSoup constructor. Entities are always converted to Unicode characters.Z markupMassagezBS4 does not respect the markupMassage argument to the BeautifulSoup constructor. The tree builder is responsible for any necessary markup massage.Z smartQuotesTozBS4 does not respect the smartQuotesTo argument to the BeautifulSoup constructor. Smart quotes are always converted to Unicode characters.ZselfClosingTagszBS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.ZisHTMLzBS4 does not respect the isHTML argument to the BeautifulSoup constructor. Suggest you use features='lxml' for HTML and features='lxml-xml' for XML.cs0|kr,tjd||f|}|=|SdS)NzLThe "%s" argument to the BeautifulSoup constructor has been renamed to "%s.")warningswarn)Zold_namenew_namevalue)kwargs/usr/lib/python3.6/__init__.pydeprecated_arguments z3BeautifulSoup.__init__..deprecated_argumentZparseOnlyThese parse_onlyZ fromEncoding from_encodingzlYou provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.Nrz2__init__() got an unexpected keyword argument '%s'zjCouldn't find a tree builder with the features you requested: %s. Do you need to install a parser library?,ZXMLZHTMLr__file__.pyc.pyo)filename line_numberparser markup_type) stacklevelread<<utf8Fzw"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.)exclude_encodings)rr )2rr isinstancestrlenlistkeyspop TypeErrorDEFAULT_BUILDER_FEATURESrlookupFeatureNotFoundjoinNAMEZALTERNATE_NAMESis_xmlsys _getframe ValueError f_globalsf_lineno__dict__getlowerendswithdictNO_PARSER_SPECIFIED_WARNINGbuilderZ known_xmlsouprhasattrr'bytesospathsupports_unicode_filenamesencodeexists Exception_check_markup_is_urlZprepare_markupmarkuporiginal_encodingZdeclared_html_encodingZcontains_replacement_charactersreset_feedr)selfrQfeaturesrFrrr,rrargZoriginal_featuresZ builder_classr$Zcallerglobalsr"r!ZfnlvaluesZpossible_filenameis_fileer)rr__init__Xs'                       zBeautifulSoup.__init__cCs&t||jd|jdd}|j|_|S)Nzutf-8)rFr)typerMrFrR)rUcopyrrr__copy__$szBeautifulSoup.__copy__cCs(t|j}d|kr$|jj r$d|d<|S)NrF)rDr@rFZ picklable)rUdrrr __getstate__0s zBeautifulSoup.__getstate__csxttrd}d }nttr(d}d}ndStfdd |Drt|krtttrbjd d }n}tjd |dS)z Check if markup looks like it's actually a url and raise a warning if so. Markup can be unicode or str (py2) / bytes (py3).  http:https: http:https:Nc3s|]}j|VqdS)N) startswith).0prefix)rQrr Fsz5BeautifulSoup._check_markup_is_url..zutf-8replacez"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.)rcrd)rfrg)r.rIr/anydecoderr)rQZspaceZcant_start_withZdecoded_markupr)rQrrP7s   z"BeautifulSoup._check_markup_is_urlcCs@|jj|jj|j|jx|jj|jkr:|jq"WdS)N) rFrSZfeedrQendData currentTagname ROOT_TAG_NAMEpopTag)rUrrrrTSs  zBeautifulSoup._feedcCsJtj|||j|jd|_|jjg|_d|_g|_g|_ |j |dS)Nr) rr\rFrrZhiddenrS current_datarptagStackpreserve_whitespace_tag_stackpushTag)rUrrrrS]s zBeautifulSoup.resetcKs|j|td|j||||S)z+Create a new tag associated with this soup.N)updaterrF)rUrq namespacensprefixattrsZkwattrsrrrnew_taggs zBeautifulSoup.new_tagcCs||S)z7Create a new NavigableString associated with this soup.r)rUssubclassrrr new_stringlszBeautifulSoup.new_stringcCs tddS)Nz4BeautifulSoup objects don't support insert_before().)NotImplementedError)rU successorrrr insert_beforepszBeautifulSoup.insert_beforecCs tddS)Nz3BeautifulSoup objects don't support insert_after().)r)rUrrrr insert_aftersszBeautifulSoup.insert_aftercCs@|jj}|jr(||jdkr(|jj|jr:|jd|_|jS)Nrr-r-)rur3rvrp)rUtagrrrrsvs    zBeautifulSoup.popTagcCsJ|jr|jjj||jj||jd|_|j|jjkrF|jj|dS)Nrr-)rpcontentsappendrurqrFZpreserve_whitespace_tagsrv)rUrrrrrws   zBeautifulSoup.pushTagcCs|jrdj|j}|jsPd}x|D]}||jkr"d}Pq"W|rPd|krLd}nd}g|_|jrt|jdkr|jj s|jj| rdS||}|j |dS)NrTF rer) rtr8rv ASCII_SPACESrr0rutextsearchobject_was_parsed)rUZcontainerClassrtZ strippableiorrrros&    zBeautifulSoup.endDatac CsV|p|j}|p|j}d}}}t|trF|j}|j}|j}|sF|j}|j|||||||_|j j ||jrRt |j d}x4|dkr|j ||krP|d8}qWt d||f|dkr|}d}n|j |d}}|t |j dkr|j}d}n|j |d}}||_|r||_||_|r.||_||_|r@||_||_|rR||_dS)z Add an object to the parse tree.Nrrz[Error building tree: supposedly %r was inserted into %r after the fact, but I don't see it!) rp_most_recent_elementr.r next_element next_siblingprevious_siblingprevious_elementZsetuprrr0r=) rUrparentZmost_recent_elementrrrrindexrrrrsR        zBeautifulSoup.object_was_parsedTcCsn||jkrdSd}t|j}xLt|dddD]8}|j|}||jkr^||jkr^|r\|j}P|j}q.W|S)zPops the tag stack up to and including the most recent instance of the given tag. If inclusivePop is false, pops the tag stack up to but *not* including the most recent instqance of the given tag.Nrrr-)rrr0rurangerqrjrs)rUrqrzZ inclusivePopZmost_recently_popped stack_sizertrrr _popToTags    zBeautifulSoup._popToTagc Cs|j|jr8t|jdkr8|jjs4|jj|| r8dSt||j|||||j|j }|dkr`|S|j rn||j _ ||_ |j ||S)aPush a start tag on to the stack. If this method returns None, the tag was rejected by the SoupStrainer. You should proceed as if the tag had not occurred in the document. For instance, if this was a self-closing tag, don't call handle_endtag. rN) rorr0rurZ search_tagrrFrprrrw)rUrqryrzr{rrrrhandle_starttags   zBeautifulSoup.handle_starttagcCs|j|j||dS)N)ror)rUrqrzrrr handle_endtagszBeautifulSoup.handle_endtagcCs|jj|dS)N)rtr)rUdatarrr handle_dataszBeautifulSoup.handle_dataFZminimalcsN|jr$d}|dkrd|}d|}nd}|s2d}nd}|tt|j|||S)zlReturns a string or Unicode representation of this document. To get Unicode, pass None for encoding.rNz encoding="%s"z r)r:superrrn)rUZ pretty_printZeventual_encodingZ formatterZ encoding_partrjZ indent_level) __class__rrrns  zBeautifulSoup.decode)rNNNNN)NN)NT)N)__name__ __module__ __qualname____doc__rrr5rrEr\r_ra staticmethodrPrTrSr|r rrrrsrwrorrrrrr rn __classcell__rr)rrr8s8 L        9  cs eZdZdZfddZZS)BeautifulStoneSoupz&Deprecated interface to an XML parser.cs(d|d<tjdtt|j||dS)NZxmlrVzxThe BeautifulStoneSoup class is deprecated. Instead of using it, pass features="xml" into the BeautifulSoup constructor.)rrrrr\)rUargsr)rrrr\5szBeautifulStoneSoup.__init__)rrrrr\rrr)rrr2src@s eZdZdS) StopParsingN)rrrrrrrr=src@s eZdZdS)r7N)rrrrrrrr7@sr7__main__))r __author__ __version__Z __copyright__Z __license____all__rJrer; tracebackrrFrrZdammitrelementrrr r r r r rrrrrZ_sZ_souprrOrr=r7rstdinrGprintZprettifyrrrrs6 4z   PK!ܚ>>#__pycache__/__init__.cpython-36.pycnu[3 (6]XY@sdZdZdZdZdZdgZddlZddlZddlZddl Z ddl Z dd l m Z m Z dd lmZdd lmZmZmZmZmZmZmZmZmZmZmZd d kGdddeZeZeZGdddeZGddde Z!Gddde"Z#e$dkr ddlZeej%Z&e'e&j(dS)aHBeautiful Soup Elixir and Tonic "The Screen-Scraper's Friend" http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup uses a pluggable XML or HTML parser to parse a (possibly invalid) document into a tree representation. Beautiful Soup provides methods and Pythonic idioms that make it easy to navigate, search, and modify the parse tree. Beautiful Soup works with Python 2.7 and up. It works better if lxml and/or html5lib is installed. For more than you ever wanted to know about Beautiful Soup, see the documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ z*Leonard Richardson (leonardr@segfault.org)z4.6.3z*Copyright (c) 2004-2018 Leonard RichardsonZMIT BeautifulSoupN)builder_registryParserRejectedMarkup) UnicodeDammit) CDataCommentDEFAULT_OUTPUT_ENCODING DeclarationDoctypeNavigableString PageElementProcessingInstruction ResultSet SoupStrainerTagz`You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work.zuYou need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).cseZdZdZdZddgZdZdZd2d d Zd d Z d dZ e ddZ ddZ ddZddifddZefddZddZddZddZdd Zefd!d"Zd3d#d$Zd4d&d'Zd(d)Zd5d*d+Zd,d-Zd.ed/ffd0d1 ZZS)6ra This class defines the basic interface called by the tree builders. These methods will be called by the parser: reset() feed(markup) The tree builder may call these methods from its feed() implementation: handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node No matter how complicated the underlying parser is, you should be able to build a tree using 'start tag' events, 'end tag' events, 'data' events, and "done with data" events. If you encounter an empty-element tag (aka a self-closing tag, like HTML's
tag), call handle_starttag and then handle_endtag. z [document]ZhtmlZfastz aNo parser was explicitly specified, so I'm using the best available %(markup_type)s parser for this system ("%(parser)s"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line %(line_number)s of the file %(filename)s. To get rid of this warning, pass the additional argument 'features="%(parser)s"' to the BeautifulSoup constructor. Nc" sdkrtjddkr*d=tjddkrBd=tjddkrZd=tjdd krrd =tjd fd d }|p|d d}|p|dd}|rt|trtjdd}tdkrtjj} td| |dkr|} t|tr|g}|dkst|dkr|j }t j |} | dkr@t ddj || }| |jkpZ| |jks|jrld} nd} d} ytjd} Wntk rYnX| r| j}| j}n tj}d}|jd}|r|j}|jd(r|dd)}|rt|||j| d}tj|j|dd||_|j|_|j|_||j_||_t |d rN|j!}nt|d!krt|t"rrd"|kst|trd#|krt|trt#j$j% r|j&d$}n|}d%}yt#j$j'|}Wn$t(k r}zWYdd}~XnX|rt|tr|j&d$}tjd&||j)|xZ|jj*|||d'D]D\|_+|_,|_-|_.|j/y|j0PWnt1k rrYnXq2Wd|_+d|j_dS)*a_Constructor. :param markup: A string or a file-like object representing markup to be parsed. :param features: Desirable features of the parser to be used. This may be the name of a specific parser ("lxml", "lxml-xml", "html.parser", or "html5lib") or it may be the type of markup to be used ("html", "html5", "xml"). It's recommended that you name a specific parser, so that Beautiful Soup gives you the same results across platforms and virtual environments. :param builder: A specific TreeBuilder to use instead of looking one up based on `features`. You shouldn't need to use this. :param parse_only: A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered. This is useful when parsing part of a document that would otherwise be too large to fit into memory. :param from_encoding: A string indicating the encoding of the document to be parsed. Pass this in if Beautiful Soup is guessing wrongly about the document's encoding. :param exclude_encodings: A list of strings indicating encodings known to be wrong. Pass this in if you don't know the document's encoding but you know Beautiful Soup's guess is wrong. :param kwargs: For backwards compatibility purposes, the constructor accepts certain keyword arguments used in Beautiful Soup 3. None of these arguments do anything in Beautiful Soup 4 and there's no need to actually pass keyword arguments into the constructor. ZconvertEntitieszBS4 does not respect the convertEntities argument to the BeautifulSoup constructor. Entities are always converted to Unicode characters.Z markupMassagezBS4 does not respect the markupMassage argument to the BeautifulSoup constructor. The tree builder is responsible for any necessary markup massage.Z smartQuotesTozBS4 does not respect the smartQuotesTo argument to the BeautifulSoup constructor. Smart quotes are always converted to Unicode characters.ZselfClosingTagszBS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.ZisHTMLzBS4 does not respect the isHTML argument to the BeautifulSoup constructor. Suggest you use features='lxml' for HTML and features='lxml-xml' for XML.cs0|kr,tjd||f|}|=|SdS)NzLThe "%s" argument to the BeautifulSoup constructor has been renamed to "%s.")warningswarn)Zold_namenew_namevalue)kwargs/usr/lib/python3.6/__init__.pydeprecated_arguments z3BeautifulSoup.__init__..deprecated_argumentZparseOnlyThese parse_onlyZ fromEncoding from_encodingzlYou provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.Nrz2__init__() got an unexpected keyword argument '%s'zjCouldn't find a tree builder with the features you requested: %s. Do you need to install a parser library?,ZXMLZHTMLr__file__.pyc.pyo)filename line_numberparser markup_type) stacklevelread<<utf8Fzw"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.)exclude_encodings)rr )2rr isinstancestrlenlistkeyspop TypeErrorDEFAULT_BUILDER_FEATURESrlookupFeatureNotFoundjoinNAMEZALTERNATE_NAMESis_xmlsys _getframe ValueError f_globalsf_lineno__dict__getlowerendswithdictNO_PARSER_SPECIFIED_WARNINGbuilderZ known_xmlsouprhasattrr'bytesospathsupports_unicode_filenamesencodeexists Exception_check_markup_is_urlZprepare_markupmarkuporiginal_encodingZdeclared_html_encodingZcontains_replacement_charactersreset_feedr)selfrQfeaturesrFrrr,rrargZoriginal_featuresZ builder_classr$Zcallerglobalsr"r!ZfnlvaluesZpossible_filenameis_fileer)rr__init__Xs'                       zBeautifulSoup.__init__cCs&t||jd|jdd}|j|_|S)Nzutf-8)rFr)typerMrFrR)rUcopyrrr__copy__$szBeautifulSoup.__copy__cCs(t|j}d|kr$|jj r$d|d<|S)NrF)rDr@rFZ picklable)rUdrrr __getstate__0s zBeautifulSoup.__getstate__csxttrd}d }nttr(d}d}ndStfdd |Drt|krtttrbjd d }n}tjd |dS)z Check if markup looks like it's actually a url and raise a warning if so. Markup can be unicode or str (py2) / bytes (py3).  http:https: http:https:Nc3s|]}j|VqdS)N) startswith).0prefix)rQrr Fsz5BeautifulSoup._check_markup_is_url..zutf-8replacez"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.)rcrd)rfrg)r.rIr/anydecoderr)rQZspaceZcant_start_withZdecoded_markupr)rQrrP7s   z"BeautifulSoup._check_markup_is_urlcCs@|jj|jj|j|jx|jj|jkr:|jq"WdS)N) rFrSZfeedrQendData currentTagname ROOT_TAG_NAMEpopTag)rUrrrrTSs  zBeautifulSoup._feedcCsJtj|||j|jd|_|jjg|_d|_g|_g|_ |j |dS)Nr) rr\rFrrZhiddenrS current_datarptagStackpreserve_whitespace_tag_stackpushTag)rUrrrrS]s zBeautifulSoup.resetcKs|j|td|j||||S)z+Create a new tag associated with this soup.N)updaterrF)rUrq namespacensprefixattrsZkwattrsrrrnew_taggs zBeautifulSoup.new_tagcCs||S)z7Create a new NavigableString associated with this soup.r)rUssubclassrrr new_stringlszBeautifulSoup.new_stringcCs tddS)Nz4BeautifulSoup objects don't support insert_before().)NotImplementedError)rU successorrrr insert_beforepszBeautifulSoup.insert_beforecCs tddS)Nz3BeautifulSoup objects don't support insert_after().)r)rUrrrr insert_aftersszBeautifulSoup.insert_aftercCs@|jj}|jr(||jdkr(|jj|jr:|jd|_|jS)Nrr-r-)rur3rvrp)rUtagrrrrsvs    zBeautifulSoup.popTagcCsJ|jr|jjj||jj||jd|_|j|jjkrF|jj|dS)Nrr-)rpcontentsappendrurqrFZpreserve_whitespace_tagsrv)rUrrrrrws   zBeautifulSoup.pushTagcCs|jrdj|j}|jsPd}x|D]}||jkr"d}Pq"W|rPd|krLd}nd}g|_|jrt|jdkr|jj s|jj| rdS||}|j |dS)NrTF rer) rtr8rv ASCII_SPACESrr0rutextsearchobject_was_parsed)rUZcontainerClassrtZ strippableiorrrros&    zBeautifulSoup.endDatac CsV|p|j}|p|j}d}}}t|trF|j}|j}|j}|sF|j}|j|||||||_|j j ||jrRt |j d}x4|dkr|j ||krP|d8}qWt d||f|dkr|}d}n|j |d}}|t |j dkr|j}d}n|j |d}}||_|r||_||_|r.||_||_|r@||_||_|rR||_dS)z Add an object to the parse tree.Nrrz[Error building tree: supposedly %r was inserted into %r after the fact, but I don't see it!) rp_most_recent_elementr.r next_element next_siblingprevious_siblingprevious_elementZsetuprrr0r=) rUrparentZmost_recent_elementrrrrindexrrrrsR        zBeautifulSoup.object_was_parsedTcCsn||jkrdSd}t|j}xLt|dddD]8}|j|}||jkr^||jkr^|r\|j}P|j}q.W|S)zPops the tag stack up to and including the most recent instance of the given tag. If inclusivePop is false, pops the tag stack up to but *not* including the most recent instqance of the given tag.Nrrr-)rrr0rurangerqrjrs)rUrqrzZ inclusivePopZmost_recently_popped stack_sizertrrr _popToTags    zBeautifulSoup._popToTagc Cs|j|jr8t|jdkr8|jjs4|jj|| r8dSt||j|||||j|j }|dkr`|S|j rn||j _ ||_ |j ||S)aPush a start tag on to the stack. If this method returns None, the tag was rejected by the SoupStrainer. You should proceed as if the tag had not occurred in the document. For instance, if this was a self-closing tag, don't call handle_endtag. rN) rorr0rurZ search_tagrrFrprrrw)rUrqryrzr{rrrrhandle_starttags   zBeautifulSoup.handle_starttagcCs|j|j||dS)N)ror)rUrqrzrrr handle_endtagszBeautifulSoup.handle_endtagcCs|jj|dS)N)rtr)rUdatarrr handle_dataszBeautifulSoup.handle_dataFZminimalcsN|jr$d}|dkrd|}d|}nd}|s2d}nd}|tt|j|||S)zlReturns a string or Unicode representation of this document. To get Unicode, pass None for encoding.rNz encoding="%s"z r)r:superrrn)rUZ pretty_printZeventual_encodingZ formatterZ encoding_partrjZ indent_level) __class__rrrns  zBeautifulSoup.decode)rNNNNN)NN)NT)N)__name__ __module__ __qualname____doc__rrr5rrEr\r_ra staticmethodrPrTrSr|r rrrrsrwrorrrrrr rn __classcell__rr)rrr8s8 L        9  cs eZdZdZfddZZS)BeautifulStoneSoupz&Deprecated interface to an XML parser.cs(d|d<tjdtt|j||dS)NZxmlrVzxThe BeautifulStoneSoup class is deprecated. Instead of using it, pass features="xml" into the BeautifulSoup constructor.)rrrrr\)rUargsr)rrrr\5szBeautifulStoneSoup.__init__)rrrrr\rrr)rrr2src@s eZdZdS) StopParsingN)rrrrrrrr=src@s eZdZdS)r7N)rrrrrrrr7@sr7__main__))r __author__ __version__Z __copyright__Z __license____all__rJrer; tracebackrrFrrZdammitrelementrrr r r r r rrrrrZ_sZ_souprrOrr=r7rstdinrGprintZprettifyrrrrs6 4z   PK!MBHH'__pycache__/dammit.cpython-36.opt-1.pycnu[3 6]t@s dZdZddlZddlmZddlZddlZddlZdZyddl Z ddZ WnFe k ryddl Z ddZ Wne k rddZ YnXYnXy ddl Z Wne k rYnXejd jejZejd jejZGd d d eZGd ddZGdddZdS)aBBeautiful Soup bonus library: Unicode, Dammit This library converts a bytestream to Unicode through any means necessary. It is heavily based on code from Mark Pilgrim's Universal Feed Parser. It works best on XML and HTML, but it does not rewrite the XML or HTML to reflect a new encoding; that's the tree builder's job. ZMITN)codepoint2namecCstj|dS)Nencoding)cchardetdetect)sr/usr/lib/python3.6/dammit.pychardet_dammitsr cCstj|dS)Nr)chardetr)rrrrr !scCsdS)Nr)rrrrr 'sz!^<\?.*encoding=['"](.*?)['"].*\?>z0<\s*meta[^>]+charset\s*=\s*["']?([^>]*?)[ /;'">]c@seZdZdZddZe\ZZZdddddd Ze j d Z e j d Z e d d Ze ddZe ddZe dddZe dddZe ddZdS)EntitySubstitutionzASubstitute XML or HTML entities for the corresponding characters.cCsni}i}g}xBttjD]2\}}t|}|dkrD|j||||<|||<qWddj|}||tj|fS)N"z[%s])listritemschrappendjoinrecompile)lookupZreverse_lookupZcharacters_for_reZ codepointname characterZ re_definitionrrr_populate_class_variables9s  z,EntitySubstitution._populate_class_variablesZaposZquotZampltgt)'"&<>z&([<>]|&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;))z([<>&])cCs|jj|jd}d|S)Nrz&%s;)CHARACTER_TO_HTML_ENTITYgetgroup)clsmatchobjentityrrr_substitute_html_entityZsz*EntitySubstitution._substitute_html_entitycCs|j|jd}d|S)zmUsed with a regular expression to substitute the appropriate XML entity for an XML special character.rz&%s;)CHARACTER_TO_XML_ENTITYr")r#r$r%rrr_substitute_xml_entity_sz)EntitySubstitution._substitute_xml_entitycCs6d}d|kr*d|kr&d}|jd|}nd}|||S)a*Make a value into a quoted XML attribute, possibly escaping it. Most strings will be quoted using double quotes. Bob's Bar -> "Bob's Bar" If a string contains double quotes, it will be quoted using single quotes. Welcome to "my bar" -> 'Welcome to "my bar"' If a string contains both single and double quotes, the double quotes will be escaped, and the string will be quoted using double quotes. Welcome to "Bob's Bar" -> "Welcome to "Bob's bar" rrz")replace)selfvalueZ quote_withZ replace_withrrrquoted_attribute_valuefsz)EntitySubstitution.quoted_attribute_valueFcCs"|jj|j|}|r|j|}|S)a Substitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands will become &. If you want ampersands that appear to be part of an entity definition to be left alone, use substitute_xml_containing_entities() instead. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. )AMPERSAND_OR_BRACKETsubr(r,)r#r+make_quoted_attributerrrsubstitute_xmls   z!EntitySubstitution.substitute_xmlcCs"|jj|j|}|r|j|}|S)aSubstitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands that are not part of an entity defition will become &. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. )BARE_AMPERSAND_OR_BRACKETr.r(r,)r#r+r/rrr"substitute_xml_containing_entitiess   z5EntitySubstitution.substitute_xml_containing_entitiescCs|jj|j|S)aReplace certain Unicode characters with named HTML entities. This differs from data.encode(encoding, 'xmlcharrefreplace') in that the goal is to make the result more readable (to those with ASCII displays) rather than to recover from errors. There's absolutely nothing wrong with a UTF-8 string containg a LATIN SMALL LETTER E WITH ACUTE, but replacing that character with "é" will make it more readable to some people. )CHARACTER_TO_HTML_ENTITY_REr.r&)r#rrrrsubstitute_htmls z"EntitySubstitution.substitute_htmlN)F)F)__name__ __module__ __qualname____doc__rr ZHTML_ENTITY_TO_CHARACTERr3r'rrr1r- classmethodr&r(r,r0r2r4rrrrr 5s$      %  r c@sHeZdZdZdddZddZedd Zed d Z edd d Z dS)EncodingDetectora^Suggests a number of possible encodings for a bytestring. Order of precedence: 1. Encodings you specifically tell EncodingDetector to try first (the override_encodings argument to the constructor). 2. An encoding declared within the bytestring itself, either in an XML declaration (if the bytestring is to be interpreted as an XML document), or in a tag (if the bytestring is to be interpreted as an HTML document.) 3. An encoding detected through textual analysis by chardet, cchardet, or a similar external library. 4. UTF-8. 5. Windows-1252. NFcCsN|pg|_|pg}tdd|D|_d|_||_d|_|j|\|_|_dS)NcSsg|] }|jqSr)lower).0xrrr sz-EncodingDetector.__init__..) override_encodingssetexclude_encodingschardet_encodingis_htmldeclared_encodingstrip_byte_order_markmarkupsniffed_encoding)r*rFr?rCrArrr__init__s zEncodingDetector.__init__cCs8|dk r4|j}||jkrdS||kr4|j|dSdS)NFT)r;rAadd)r*rtriedrrr_usables  zEncodingDetector._usableccst}x |jD]}|j||r|VqW|j|j|r>|jV|jdkrZ|j|j|j|_|j|j|rp|jV|jdkrt |j|_|j|j|r|jVxdD]}|j||r|VqWdS)z tag, hopefully near the beginning of the document. iig?N)endposrasciir)) rVmaxintxml_encoding_research html_meta_regroupsdecoder;)r#rFrCZsearch_entire_documentZ xml_endposZ html_endposrDZdeclared_encoding_matchrrrrN+s   z'EncodingDetector.find_declared_encoding)NFN)FF) r5r6r7r8rHrKpropertyrPr9rErNrrrrr:s  ! r:c@sReZdZdZdddZdddgZgdd gfd d Zd d ZdddZdddZ e ddZ ddZ ddZ ddd d!d"d#d$d%d&d'd(d)d*d2d+d2d2d,d-d.d/d0d1d2d3d4d5d6d7d2d8d9dQ ZdRddSdTdUdVdWdXdYdZd[d\d]d2d^d2d2d_d_d`d`dadbdcdddedfdgdhd2didjddkdldmdndodpd[dqdPdrdsdkddtdbdudvdwdxd:dzd{dadSd|drd}d~ddd2ddddddddddddddddddddddddaddddddjddddddddddlddddddddduddudududududdudzdzdzdzddddZdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd d d d d dzZd;d<d=gZeddZed>dZed?ddZdS(@ UnicodeDammitzA class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.z mac-romanz shift-jis) macintoshzx-sjis windows-1252z iso-8859-1z iso-8859-2NFcCs||_g|_d|_||_tjt|_t|||||_ t |t sF|dkr`||_ t ||_ d|_dS|j j |_ d}x,|j jD] }|j j }|j|}|dk rxPqxW|sx@|j jD]4}|dkr|j|d}|dk r|jjdd|_PqW||_ |sd|_dS)NFr rYr)zSSome characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.T)smart_quotes_totried_encodingsZcontains_replacement_charactersrCloggingZ getLoggerr5logr:detectorrTrUrFZunicode_markuporiginal_encodingrP _convert_fromZwarning)r*rFr?rerCrAurrrrrHXs>     zUnicodeDammit.__init__cCs|jd}|jdkr&|jj|j}nf|jj|}t|tkr|jdkrfdj|djdj}qdj|djdj}n|j}|S)z[Changes a MS smart quote character to an XML or HTML entity, or an ASCII character.rYZxmlz&#x;rr)r"reMS_CHARS_TO_ASCIIr!encodeMS_CHARStypetuple)r*matchZorigr.rrr _sub_ms_chars     zUnicodeDammit._sub_ms_charstrictcCs|j|}| s||f|jkr"dS|jj||f|j}|jdk rh||jkrhd}tj|}|j|j |}y|j |||}||_||_ Wn t k r}zdSd}~XnX|jS)Ns([-])) find_codecrfrrFreENCODINGS_WITH_SMART_QUOTESrrr.ru _to_unicoderj Exception)r*ZproposederrorsrFZsmart_quotes_reZsmart_quotes_compiledrlrOrrrrks"     zUnicodeDammit._convert_fromcCs t|||S)zGiven a string and its encoding, decodes the string into Unicode. %encoding is a string recognized by encodings.aliases)rU)r*rWrr{rrrryszUnicodeDammit._to_unicodecCs|js dS|jjS)N)rCrirD)r*rrrdeclared_html_encodingsz$UnicodeDammit.declared_html_encodingcCs`|j|jj||pN|r*|j|jddpN|r@|j|jddpN|rL|jpN|}|r\|jSdS)N-r _)_codecCHARSET_ALIASESr!r)r;)r*charsetr+rrrrws zUnicodeDammit.find_codecc Cs<|s|Sd}ytj||}Wnttfk r6YnX|S)N)codecsr LookupError ValueError)r*rcodecrrrrs zUnicodeDammit._codeceuro20AC sbquo201Afnof192bdquo201Ehellip2026dagger2020Dagger2021circ2C6permil2030Scaron160lsaquo2039OElig152?#x17D17Dlsquo2018rsquo2019ldquo201Crdquo201Dbull2022ndash2013mdash2014tilde2DCtrade2122scaron161rsaquo203Aoelig153#x17E17EYumlr ) ZEUR,fz,,z...+z++^%SrZOEZrr*r}z--~z(TM)rrZoezY!cZGBP$ZYEN|z..z(th)z<>z1/4z1/2z3/4AZAECEIDNOUbBaZaerOin/y)rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrs€s‚sƒs„s…s†s‡sˆs‰sŠs‹sŒsŽs‘s’s“s”s•s–s—s˜s™sšs›sœsžsŸs s¡s¢s£s¤s¥s¦s§s¨s©sªs«s¬s­s®s¯s°s±s²s³s´sµs¶s·s¸s¹sºs»s¼s½s¾s¿sÀsÁsÂsÃsÄsÅsÆsÇsÈsÉsÊsËsÌsÍsÎsÏsÐsÑsÒsÓsÔsÕsÖs×sØsÙsÚsÛsÜsÝsÞsßsàrsâsãsäsåsæsçsèsésêsësìsísîsïsðsñsòsósôsõsös÷søsùsúsûsüsýsþ)zrrrRrrrSrrrQrrmutf8c Cs$|jddjdkrtd|jdkr0tdg}d }d }x|t|kr||}t|tsft|}||jkr||jkrxz|j D]$\}} } ||kr|| kr|| 7}PqWq>|d kr||j kr|j ||||j |j ||d 7}|}q>|d 7}q>W|d kr|S|j ||d d j |S)aFix characters from one encoding embedded in some other encoding. Currently the only situation supported is Windows-1252 (or its subset ISO-8859-1), embedded in UTF-8. The input must be a bytestring. If you've already converted the document to Unicode, you're too late. The output is a bytestring in which `embedded_encoding` characters have been converted to their `main_encoding` equivalents. r~r} windows-1252 windows_1252zPWindows-1252 and ISO-8859-1 are the only currently supported embedded encodings.rutf-8z4UTF-8 is the only currently supported main encoding.rrarmN)rr)rr) r)r;NotImplementedErrorrVrTr[ordFIRST_MULTIBYTE_MARKERLAST_MULTIBYTE_MARKERMULTIBYTE_MARKERS_AND_SIZESWINDOWS_1252_TO_UTF8rr) r#Zin_bytesZ main_encodingZembedded_encodingZ byte_chunksZ chunk_startposZbytestartendsizerrr detwingle s<      zUnicodeDammit.detwingle)rv)rv)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr )rr)rrrR)rrrS)rrrQ)rrd)r5r6r7r8rrxrHrurkryrar|rwrrqrorrrrr9rrrrrrbEs`1        rb)r8Z __license__rZ html.entitiesrrrgstringZ chardet_typerr ImportErrorr Z iconv_codecrrprr\r^objectr r:rbrrrrs8    PK!MBHH!__pycache__/dammit.cpython-36.pycnu[3 6]t@s dZdZddlZddlmZddlZddlZddlZdZyddl Z ddZ WnFe k ryddl Z ddZ Wne k rddZ YnXYnXy ddl Z Wne k rYnXejd jejZejd jejZGd d d eZGd ddZGdddZdS)aBBeautiful Soup bonus library: Unicode, Dammit This library converts a bytestream to Unicode through any means necessary. It is heavily based on code from Mark Pilgrim's Universal Feed Parser. It works best on XML and HTML, but it does not rewrite the XML or HTML to reflect a new encoding; that's the tree builder's job. ZMITN)codepoint2namecCstj|dS)Nencoding)cchardetdetect)sr/usr/lib/python3.6/dammit.pychardet_dammitsr cCstj|dS)Nr)chardetr)rrrrr !scCsdS)Nr)rrrrr 'sz!^<\?.*encoding=['"](.*?)['"].*\?>z0<\s*meta[^>]+charset\s*=\s*["']?([^>]*?)[ /;'">]c@seZdZdZddZe\ZZZdddddd Ze j d Z e j d Z e d d Ze ddZe ddZe dddZe dddZe ddZdS)EntitySubstitutionzASubstitute XML or HTML entities for the corresponding characters.cCsni}i}g}xBttjD]2\}}t|}|dkrD|j||||<|||<qWddj|}||tj|fS)N"z[%s])listritemschrappendjoinrecompile)lookupZreverse_lookupZcharacters_for_reZ codepointname characterZ re_definitionrrr_populate_class_variables9s  z,EntitySubstitution._populate_class_variablesZaposZquotZampltgt)'"&<>z&([<>]|&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;))z([<>&])cCs|jj|jd}d|S)Nrz&%s;)CHARACTER_TO_HTML_ENTITYgetgroup)clsmatchobjentityrrr_substitute_html_entityZsz*EntitySubstitution._substitute_html_entitycCs|j|jd}d|S)zmUsed with a regular expression to substitute the appropriate XML entity for an XML special character.rz&%s;)CHARACTER_TO_XML_ENTITYr")r#r$r%rrr_substitute_xml_entity_sz)EntitySubstitution._substitute_xml_entitycCs6d}d|kr*d|kr&d}|jd|}nd}|||S)a*Make a value into a quoted XML attribute, possibly escaping it. Most strings will be quoted using double quotes. Bob's Bar -> "Bob's Bar" If a string contains double quotes, it will be quoted using single quotes. Welcome to "my bar" -> 'Welcome to "my bar"' If a string contains both single and double quotes, the double quotes will be escaped, and the string will be quoted using double quotes. Welcome to "Bob's Bar" -> "Welcome to "Bob's bar" rrz")replace)selfvalueZ quote_withZ replace_withrrrquoted_attribute_valuefsz)EntitySubstitution.quoted_attribute_valueFcCs"|jj|j|}|r|j|}|S)a Substitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands will become &. If you want ampersands that appear to be part of an entity definition to be left alone, use substitute_xml_containing_entities() instead. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. )AMPERSAND_OR_BRACKETsubr(r,)r#r+make_quoted_attributerrrsubstitute_xmls   z!EntitySubstitution.substitute_xmlcCs"|jj|j|}|r|j|}|S)aSubstitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands that are not part of an entity defition will become &. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. )BARE_AMPERSAND_OR_BRACKETr.r(r,)r#r+r/rrr"substitute_xml_containing_entitiess   z5EntitySubstitution.substitute_xml_containing_entitiescCs|jj|j|S)aReplace certain Unicode characters with named HTML entities. This differs from data.encode(encoding, 'xmlcharrefreplace') in that the goal is to make the result more readable (to those with ASCII displays) rather than to recover from errors. There's absolutely nothing wrong with a UTF-8 string containg a LATIN SMALL LETTER E WITH ACUTE, but replacing that character with "é" will make it more readable to some people. )CHARACTER_TO_HTML_ENTITY_REr.r&)r#rrrrsubstitute_htmls z"EntitySubstitution.substitute_htmlN)F)F)__name__ __module__ __qualname____doc__rr ZHTML_ENTITY_TO_CHARACTERr3r'rrr1r- classmethodr&r(r,r0r2r4rrrrr 5s$      %  r c@sHeZdZdZdddZddZedd Zed d Z edd d Z dS)EncodingDetectora^Suggests a number of possible encodings for a bytestring. Order of precedence: 1. Encodings you specifically tell EncodingDetector to try first (the override_encodings argument to the constructor). 2. An encoding declared within the bytestring itself, either in an XML declaration (if the bytestring is to be interpreted as an XML document), or in a tag (if the bytestring is to be interpreted as an HTML document.) 3. An encoding detected through textual analysis by chardet, cchardet, or a similar external library. 4. UTF-8. 5. Windows-1252. NFcCsN|pg|_|pg}tdd|D|_d|_||_d|_|j|\|_|_dS)NcSsg|] }|jqSr)lower).0xrrr sz-EncodingDetector.__init__..) override_encodingssetexclude_encodingschardet_encodingis_htmldeclared_encodingstrip_byte_order_markmarkupsniffed_encoding)r*rFr?rCrArrr__init__s zEncodingDetector.__init__cCs8|dk r4|j}||jkrdS||kr4|j|dSdS)NFT)r;rAadd)r*rtriedrrr_usables  zEncodingDetector._usableccst}x |jD]}|j||r|VqW|j|j|r>|jV|jdkrZ|j|j|j|_|j|j|rp|jV|jdkrt |j|_|j|j|r|jVxdD]}|j||r|VqWdS)z tag, hopefully near the beginning of the document. iig?N)endposrasciir)) rVmaxintxml_encoding_research html_meta_regroupsdecoder;)r#rFrCZsearch_entire_documentZ xml_endposZ html_endposrDZdeclared_encoding_matchrrrrN+s   z'EncodingDetector.find_declared_encoding)NFN)FF) r5r6r7r8rHrKpropertyrPr9rErNrrrrr:s  ! r:c@sReZdZdZdddZdddgZgdd gfd d Zd d ZdddZdddZ e ddZ ddZ ddZ ddd d!d"d#d$d%d&d'd(d)d*d2d+d2d2d,d-d.d/d0d1d2d3d4d5d6d7d2d8d9dQ ZdRddSdTdUdVdWdXdYdZd[d\d]d2d^d2d2d_d_d`d`dadbdcdddedfdgdhd2didjddkdldmdndodpd[dqdPdrdsdkddtdbdudvdwdxd:dzd{dadSd|drd}d~ddd2ddddddddddddddddddddddddaddddddjddddddddddlddddddddduddudududududdudzdzdzdzddddZdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd d d d d dzZd;d<d=gZeddZed>dZed?ddZdS(@ UnicodeDammitzA class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.z mac-romanz shift-jis) macintoshzx-sjis windows-1252z iso-8859-1z iso-8859-2NFcCs||_g|_d|_||_tjt|_t|||||_ t |t sF|dkr`||_ t ||_ d|_dS|j j |_ d}x,|j jD] }|j j }|j|}|dk rxPqxW|sx@|j jD]4}|dkr|j|d}|dk r|jjdd|_PqW||_ |sd|_dS)NFr rYr)zSSome characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.T)smart_quotes_totried_encodingsZcontains_replacement_charactersrCloggingZ getLoggerr5logr:detectorrTrUrFZunicode_markuporiginal_encodingrP _convert_fromZwarning)r*rFr?rerCrAurrrrrHXs>     zUnicodeDammit.__init__cCs|jd}|jdkr&|jj|j}nf|jj|}t|tkr|jdkrfdj|djdj}qdj|djdj}n|j}|S)z[Changes a MS smart quote character to an XML or HTML entity, or an ASCII character.rYZxmlz&#x;rr)r"reMS_CHARS_TO_ASCIIr!encodeMS_CHARStypetuple)r*matchZorigr.rrr _sub_ms_chars     zUnicodeDammit._sub_ms_charstrictcCs|j|}| s||f|jkr"dS|jj||f|j}|jdk rh||jkrhd}tj|}|j|j |}y|j |||}||_||_ Wn t k r}zdSd}~XnX|jS)Ns([-])) find_codecrfrrFreENCODINGS_WITH_SMART_QUOTESrrr.ru _to_unicoderj Exception)r*ZproposederrorsrFZsmart_quotes_reZsmart_quotes_compiledrlrOrrrrks"     zUnicodeDammit._convert_fromcCs t|||S)zGiven a string and its encoding, decodes the string into Unicode. %encoding is a string recognized by encodings.aliases)rU)r*rWrr{rrrryszUnicodeDammit._to_unicodecCs|js dS|jjS)N)rCrirD)r*rrrdeclared_html_encodingsz$UnicodeDammit.declared_html_encodingcCs`|j|jj||pN|r*|j|jddpN|r@|j|jddpN|rL|jpN|}|r\|jSdS)N-r _)_codecCHARSET_ALIASESr!r)r;)r*charsetr+rrrrws zUnicodeDammit.find_codecc Cs<|s|Sd}ytj||}Wnttfk r6YnX|S)N)codecsr LookupError ValueError)r*rcodecrrrrs zUnicodeDammit._codeceuro20AC sbquo201Afnof192bdquo201Ehellip2026dagger2020Dagger2021circ2C6permil2030Scaron160lsaquo2039OElig152?#x17D17Dlsquo2018rsquo2019ldquo201Crdquo201Dbull2022ndash2013mdash2014tilde2DCtrade2122scaron161rsaquo203Aoelig153#x17E17EYumlr ) ZEUR,fz,,z...+z++^%SrZOEZrr*r}z--~z(TM)rrZoezY!cZGBP$ZYEN|z..z(th)z<>z1/4z1/2z3/4AZAECEIDNOUbBaZaerOin/y)rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrs€s‚sƒs„s…s†s‡sˆs‰sŠs‹sŒsŽs‘s’s“s”s•s–s—s˜s™sšs›sœsžsŸs s¡s¢s£s¤s¥s¦s§s¨s©sªs«s¬s­s®s¯s°s±s²s³s´sµs¶s·s¸s¹sºs»s¼s½s¾s¿sÀsÁsÂsÃsÄsÅsÆsÇsÈsÉsÊsËsÌsÍsÎsÏsÐsÑsÒsÓsÔsÕsÖs×sØsÙsÚsÛsÜsÝsÞsßsàrsâsãsäsåsæsçsèsésêsësìsísîsïsðsñsòsósôsõsös÷søsùsúsûsüsýsþ)zrrrRrrrSrrrQrrmutf8c Cs$|jddjdkrtd|jdkr0tdg}d }d }x|t|kr||}t|tsft|}||jkr||jkrxz|j D]$\}} } ||kr|| kr|| 7}PqWq>|d kr||j kr|j ||||j |j ||d 7}|}q>|d 7}q>W|d kr|S|j ||d d j |S)aFix characters from one encoding embedded in some other encoding. Currently the only situation supported is Windows-1252 (or its subset ISO-8859-1), embedded in UTF-8. The input must be a bytestring. If you've already converted the document to Unicode, you're too late. The output is a bytestring in which `embedded_encoding` characters have been converted to their `main_encoding` equivalents. r~r} windows-1252 windows_1252zPWindows-1252 and ISO-8859-1 are the only currently supported embedded encodings.rutf-8z4UTF-8 is the only currently supported main encoding.rrarmN)rr)rr) r)r;NotImplementedErrorrVrTr[ordFIRST_MULTIBYTE_MARKERLAST_MULTIBYTE_MARKERMULTIBYTE_MARKERS_AND_SIZESWINDOWS_1252_TO_UTF8rr) r#Zin_bytesZ main_encodingZembedded_encodingZ byte_chunksZ chunk_startposZbytestartendsizerrr detwingle s<      zUnicodeDammit.detwingle)rv)rv)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr)rr )rr)rrrR)rrrS)rrrQ)rrd)r5r6r7r8rrxrHrurkryrar|rwrrqrorrrrr9rrrrrrbEs`1        rb)r8Z __license__rZ html.entitiesrrrgstringZ chardet_typerr ImportErrorr Z iconv_codecrrprr\r^objectr r:rbrrrrs8    PK!7. T)__pycache__/diagnose.cpython-36.opt-1.pycnu[3 #6]@s dZdZddlZddlmZddlmZddlZddlmZm Z ddl m Z ddl Z ddl Z ddlZddlZddlZddlZddlZddlZdd Zd#d d ZGd ddeZddZdZdZd$ddZd%ddZd&ddZd'ddZd(d d!Zed"kreejj dS))z=Diagnostic functions, mainly for use when doing tech support.ZMITN)StringIO) HTMLParser) BeautifulSoup __version__)builder_registryc ;CsXtdttdtjdddg}x>|D]6}x0tjD]}||jkr6Pq6W|j|td|q*Wd|kr|jdy*dd l m }td d j t t |jWn*tk r}ztd WYd d }~XnXd|krydd l}td|jWn,tk r}ztdWYd d }~XnXt|dr4|j}n|jdsL|jdrdtd|tdd Sy:tjj|rtd|t|}|j}Wd QRXWntk rYnXtx|D]}td|d} yt||d} d} Wn8tk r"}ztd|tjWYd d }~XnX| rBtd|t| jtddqWd S)z/Diagnostic suite for isolating common problems.z'Diagnostic running on Beautiful Soup %szPython version %sz html.parserhtml5liblxmlz;I noticed that %s is not installed. Installing it may help.zlxml-xmlr)etreezFound lxml version %s.z.lxml is not installed or couldn't be imported.NzFound html5lib version %sz2html5lib is not installed or couldn't be imported.readzhttp:zhttps:z<"%s" looks like a URL. Beautiful Soup is not an HTTP client.zpYou need to use some other library to get the document behind the URL, and feed that document to Beautiful Soup.z7"%s" looks like a filename. Reading data from the file.z#Trying to parse your markup with %sF)featuresTz%s could not parse the markup.z#Here's what %s did with the markup:-P)printrsysversionrZbuildersr removeappendrr joinmapstrZ LXML_VERSION ImportErrorrhasattrr startswithospathexistsopen ValueErrorr Exception traceback print_excZprettify) dataZ basic_parsersnameZbuilderr erfpparsersuccesssoupr)/usr/lib/python3.6/diagnose.pydiagnosesj                     r+TcKsNddlm}x<|jt|fd|i|D]\}}td||j|jfq(WdS)zPrint out the lxml events that occur during parsing. This lets you see how lxml parses a document when no Beautiful Soup code is running. r)r htmlz %s, %4s, %sN)rr Z iterparserrtagtext)r"r,kwargsr Zeventelementr)r)r* lxml_traceZs $r1c@s`eZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ dS)AnnouncingParserz?Announces HTMLParser parse events, without doing anything else.cCs t|dS)N)r)selfsr)r)r*_pgszAnnouncingParser._pcCs|jd|dS)Nz%s START)r5)r3r#Zattrsr)r)r*handle_starttagjsz AnnouncingParser.handle_starttagcCs|jd|dS)Nz%s END)r5)r3r#r)r)r* handle_endtagmszAnnouncingParser.handle_endtagcCs|jd|dS)Nz%s DATA)r5)r3r"r)r)r* handle_datapszAnnouncingParser.handle_datacCs|jd|dS)Nz %s CHARREF)r5)r3r#r)r)r*handle_charrefsszAnnouncingParser.handle_charrefcCs|jd|dS)Nz %s ENTITYREF)r5)r3r#r)r)r*handle_entityrefvsz!AnnouncingParser.handle_entityrefcCs|jd|dS)Nz %s COMMENT)r5)r3r"r)r)r*handle_commentyszAnnouncingParser.handle_commentcCs|jd|dS)Nz%s DECL)r5)r3r"r)r)r* handle_decl|szAnnouncingParser.handle_declcCs|jd|dS)Nz%s UNKNOWN-DECL)r5)r3r"r)r)r* unknown_declszAnnouncingParser.unknown_declcCs|jd|dS)Nz%s PI)r5)r3r"r)r)r* handle_piszAnnouncingParser.handle_piN)__name__ __module__ __qualname____doc__r5r6r7r8r9r:r;r<r=r>r)r)r)r*r2dsr2cCst}|j|dS)zPrint out the HTMLParser events that occur during parsing. This lets you see how HTMLParser parses a document when no Beautiful Soup code is running. N)r2Zfeed)r"r&r)r)r*htmlparser_tracesrCZaeiouZbcdfghjklmnpqrstvwxyzcCs>d}x4t|D](}|ddkr$t}nt}|tj|7}qW|S)z#Generate a random word-like string.r)range _consonants_vowelsrandomchoice)lengthr4itr)r)r*rwords rOcCsdjddt|DS)z'Generate a random sentence-like string. css|]}ttjddVqdS)rP N)rOrJrandint).0rMr)r)r* szrsentence..)rrG)rLr)r)r* rsentencesrVcCsdddddddg}g}x~t|D]r}tjdd }|dkrRtj|}|jd |q |d krr|jttjd d q |d kr tj|}|jd|q Wddj|dS)z+Randomly generate an invalid HTML document.pZdivspanrMbZscripttablerz<%s>rPrFzz z)rGrJrSrKrrVr) num_elementsZ tag_nameselementsrMrKZtag_namer)r)r*rdocs   ra順c Cs(tdtt|}tdt|xdddgddgD]z}d}y"tj}t||}tj}d}Wn6tk r}ztd |tjWYd d }~XnX|r6td |||fq6Wd d l m }tj}|j |tj}td||d d l } | j }tj}|j|tj}td||d S)z.Very basic head-to-head performance benchmark.z1Comparative parser benchmark on Beautiful Soup %sz3Generated a large invalid HTML document (%d bytes).rr,rz html.parserFTz%s could not parse the markup.Nz"BS4+%s parsed the markup in %.2fs.r)r z$Raw lxml parsed the markup in %.2fs.z(Raw html5lib parsed the markup in %.2fs.)rrralentimerrr r!rr ZHTMLrrparse) r_r"r&r'ar(rZr$r rr)r)r*benchmark_parserss4      rgrcCsXtj}|j}t|}tt||d}tjd|||tj |}|j d|j dddS)N)bs4r"r&zbs4.BeautifulSoup(data, parser)Z cumulativez _html5lib|bs42) tempfileZNamedTemporaryFiler#radictrhcProfileZrunctxpstatsZStatsZ sort_statsZ print_stats)r_r&Z filehandlefilenamer"varsZstatsr)r)r*profiles  rp__main__)T)rD)rP)rW)rb)rbr)!rBZ __license__rliorZ html.parserrrhrrZ bs4.builderrrrmrJrjrdr rr+r1r2rCrIrHrOrVrargrpr?stdinr r)r)r)r*s8   C !     PK!7. T#__pycache__/diagnose.cpython-36.pycnu[3 #6]@s dZdZddlZddlmZddlmZddlZddlmZm Z ddl m Z ddl Z ddl Z ddlZddlZddlZddlZddlZddlZdd Zd#d d ZGd ddeZddZdZdZd$ddZd%ddZd&ddZd'ddZd(d d!Zed"kreejj dS))z=Diagnostic functions, mainly for use when doing tech support.ZMITN)StringIO) HTMLParser) BeautifulSoup __version__)builder_registryc ;CsXtdttdtjdddg}x>|D]6}x0tjD]}||jkr6Pq6W|j|td|q*Wd|kr|jdy*dd l m }td d j t t |jWn*tk r}ztd WYd d }~XnXd|krydd l}td|jWn,tk r}ztdWYd d }~XnXt|dr4|j}n|jdsL|jdrdtd|tdd Sy:tjj|rtd|t|}|j}Wd QRXWntk rYnXtx|D]}td|d} yt||d} d} Wn8tk r"}ztd|tjWYd d }~XnX| rBtd|t| jtddqWd S)z/Diagnostic suite for isolating common problems.z'Diagnostic running on Beautiful Soup %szPython version %sz html.parserhtml5liblxmlz;I noticed that %s is not installed. Installing it may help.zlxml-xmlr)etreezFound lxml version %s.z.lxml is not installed or couldn't be imported.NzFound html5lib version %sz2html5lib is not installed or couldn't be imported.readzhttp:zhttps:z<"%s" looks like a URL. Beautiful Soup is not an HTTP client.zpYou need to use some other library to get the document behind the URL, and feed that document to Beautiful Soup.z7"%s" looks like a filename. Reading data from the file.z#Trying to parse your markup with %sF)featuresTz%s could not parse the markup.z#Here's what %s did with the markup:-P)printrsysversionrZbuildersr removeappendrr joinmapstrZ LXML_VERSION ImportErrorrhasattrr startswithospathexistsopen ValueErrorr Exception traceback print_excZprettify) dataZ basic_parsersnameZbuilderr erfpparsersuccesssoupr)/usr/lib/python3.6/diagnose.pydiagnosesj                     r+TcKsNddlm}x<|jt|fd|i|D]\}}td||j|jfq(WdS)zPrint out the lxml events that occur during parsing. This lets you see how lxml parses a document when no Beautiful Soup code is running. r)r htmlz %s, %4s, %sN)rr Z iterparserrtagtext)r"r,kwargsr Zeventelementr)r)r* lxml_traceZs $r1c@s`eZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ dS)AnnouncingParserz?Announces HTMLParser parse events, without doing anything else.cCs t|dS)N)r)selfsr)r)r*_pgszAnnouncingParser._pcCs|jd|dS)Nz%s START)r5)r3r#Zattrsr)r)r*handle_starttagjsz AnnouncingParser.handle_starttagcCs|jd|dS)Nz%s END)r5)r3r#r)r)r* handle_endtagmszAnnouncingParser.handle_endtagcCs|jd|dS)Nz%s DATA)r5)r3r"r)r)r* handle_datapszAnnouncingParser.handle_datacCs|jd|dS)Nz %s CHARREF)r5)r3r#r)r)r*handle_charrefsszAnnouncingParser.handle_charrefcCs|jd|dS)Nz %s ENTITYREF)r5)r3r#r)r)r*handle_entityrefvsz!AnnouncingParser.handle_entityrefcCs|jd|dS)Nz %s COMMENT)r5)r3r"r)r)r*handle_commentyszAnnouncingParser.handle_commentcCs|jd|dS)Nz%s DECL)r5)r3r"r)r)r* handle_decl|szAnnouncingParser.handle_declcCs|jd|dS)Nz%s UNKNOWN-DECL)r5)r3r"r)r)r* unknown_declszAnnouncingParser.unknown_declcCs|jd|dS)Nz%s PI)r5)r3r"r)r)r* handle_piszAnnouncingParser.handle_piN)__name__ __module__ __qualname____doc__r5r6r7r8r9r:r;r<r=r>r)r)r)r*r2dsr2cCst}|j|dS)zPrint out the HTMLParser events that occur during parsing. This lets you see how HTMLParser parses a document when no Beautiful Soup code is running. N)r2Zfeed)r"r&r)r)r*htmlparser_tracesrCZaeiouZbcdfghjklmnpqrstvwxyzcCs>d}x4t|D](}|ddkr$t}nt}|tj|7}qW|S)z#Generate a random word-like string.r)range _consonants_vowelsrandomchoice)lengthr4itr)r)r*rwords rOcCsdjddt|DS)z'Generate a random sentence-like string. css|]}ttjddVqdS)rP N)rOrJrandint).0rMr)r)r* szrsentence..)rrG)rLr)r)r* rsentencesrVcCsdddddddg}g}x~t|D]r}tjdd }|dkrRtj|}|jd |q |d krr|jttjd d q |d kr tj|}|jd|q Wddj|dS)z+Randomly generate an invalid HTML document.pZdivspanrMbZscripttablerz<%s>rPrFzz z)rGrJrSrKrrVr) num_elementsZ tag_nameselementsrMrKZtag_namer)r)r*rdocs   ra順c Cs(tdtt|}tdt|xdddgddgD]z}d}y"tj}t||}tj}d}Wn6tk r}ztd |tjWYd d }~XnX|r6td |||fq6Wd d l m }tj}|j |tj}td||d d l } | j }tj}|j|tj}td||d S)z.Very basic head-to-head performance benchmark.z1Comparative parser benchmark on Beautiful Soup %sz3Generated a large invalid HTML document (%d bytes).rr,rz html.parserFTz%s could not parse the markup.Nz"BS4+%s parsed the markup in %.2fs.r)r z$Raw lxml parsed the markup in %.2fs.z(Raw html5lib parsed the markup in %.2fs.)rrralentimerrr r!rr ZHTMLrrparse) r_r"r&r'ar(rZr$r rr)r)r*benchmark_parserss4      rgrcCsXtj}|j}t|}tt||d}tjd|||tj |}|j d|j dddS)N)bs4r"r&zbs4.BeautifulSoup(data, parser)Z cumulativez _html5lib|bs42) tempfileZNamedTemporaryFiler#radictrhcProfileZrunctxpstatsZStatsZ sort_statsZ print_stats)r_r&Z filehandlefilenamer"varsZstatsr)r)r*profiles  rp__main__)T)rD)rP)rW)rb)rbr)!rBZ __license__rliorZ html.parserrrhrrZ bs4.builderrrrmrJrjrdr rr+r1r2rCrIrHrOrVrargrpr?stdinr r)r)r)r*s8   C !     PK!{i(__pycache__/element.cpython-36.opt-1.pycnu[3 &6]$@s dZyddlmZWn.ek rBZzddlmZWYddZ[XnXddlZddlZddlZddl Z ddl m Z dZ ej ddkZejdZdd ZGd d d eZGd d d eZGdddeZGdddeZGddde ZGdddeZGdddeZGdddeZGdddeZGdddeZGdddeZGd d!d!eZGd"d#d#eeZ Gd$d%d%e Z!Gd&d'd'e!Z"Gd(d)d)e!Z#Gd*d+d+e#Z$Gd,d-d-e!Z%Gd.d/d/e!Z&Gd0d1d1e!Z'Gd2d3d3eZ(Gd4d5d5eZ)Gd6d7d7e*Z+dS)8ZMIT)CallableN)EntitySubstitutionzutf-8z\s+cs&tfdd}|jfdd}|S)z>Alias one attribute name to another for backward compatibilitycs t|S)N)getattr)self)attr/usr/lib/python3.6/element.pyaliassz_alias..aliascs t|S)N)setattr)r)rrr r s)propertysetter)rr r)rr _aliassrc@seZdZdddZdS)NamespacedAttributeNcCsV|dkrtj||}n*|dkr,tj||}ntj||d|}||_||_||_|S)N:)str__new__prefixname namespace)clsrrrobjrrr r"szNamespacedAttribute.__new__)N)__name__ __module__ __qualname__rrrrr r src@seZdZdZdS)%AttributeValueWithCharsetSubstitutionz=A stand-in object for a character encoding specified in HTML.N)rrr__doc__rrrr r/src@s eZdZdZddZddZdS)CharsetMetaAttributeValuezA generic stand-in for the value of a meta tag's 'charset' attribute. When Beautiful Soup parses the markup '', the value of the 'charset' attribute will be one of these objects. cCstj||}||_|S)N)rroriginal_value)rrrrrr r9s z!CharsetMetaAttributeValue.__new__cCs|S)Nr)rencodingrrr encode>sz CharsetMetaAttributeValue.encodeN)rrrrrr rrrr r2src@s.eZdZdZejdejZddZddZ dS)ContentMetaAttributeValueaA generic stand-in for the value of a meta tag's 'content' attribute. When Beautiful Soup parses the markup: The value of the 'content' attribute will be one of these objects. z((^|;)\s*charset=)([^;]*)cCs6|jj|}|dkr tjt|Stj||}||_|S)N) CHARSET_REsearchrrr)rrmatchrrrr rMs    z!ContentMetaAttributeValue.__new__csfdd}|jj||jS)Ncs|jdS)N)group)r$)rrr rewriteXsz1ContentMetaAttributeValue.encode..rewrite)r"subr)rrr'r)rr r Ws z ContentMetaAttributeValue.encodeN) rrrrrecompileMr"rr rrrr r!Bs r!c@sVeZdZdZeddgZedgZeddgZeddZ edd Z ed d Z d S) HTMLAwareEntitySubstitutiona%Entity substitution rules that are aware of some HTML quirks. Specifically, the contents of Hello, world! ztext/javascriptscripttype)rrfind)r rLrr r rtest_double_heads  z)HTMLTreeBuilderSmokeTest.test_double_headcCsjd}|j||j|}|jdd}|j|jt|jdd}|j||j|jdd}|j||jdS)Nz

foobaz

Zfoobar)textrCbaz)rrrgrr>rrr)r rrZcommentrCrjr r r test_comments     z%HTMLTreeBuilderSmokeTest.test_commentcCstd}d}|j||j||j|}|j|jj||j|}|j|jj||jd}|j|jjddS)zWhitespace must be preserved in
 and zN)rrrZpreZprettifyZtextarea)rZ
pre_markupZtextarea_markuprr
r
r-test_preserved_whitespace_in_pre_and_textareas




zFHTMLTreeBuilderSmokeTest.test_preserved_whitespace_in_pre_and_textareacCs.d}|j|d}|j|d}|j|dS)z+Inline elements can be nested indefinitely.zInside a B tagz!

A nested tag

z/

A doubly nested tag

N)r)r Zb_tagZ nested_b_tagZdouble_nested_b_tagr r rtest_nested_inline_elements s   z4HTMLTreeBuilderSmokeTest.test_nested_inline_elementscCs6|jd}|j}|j|jjjd|j|jjddS)zBlock elements can be nested.z*

Foo

ZFooN)r blockquoterrHbstring)r rrnr r r test_nested_block_level_elementss z9HTMLTreeBuilderSmokeTest.test_nested_block_level_elementscCsd}|j|d|jddS)z$One table can go inside another one.z[zh
Here's another table:
foo
Here's another table:
foo
z{
Foo
Bar
Baz
N)r)r rr r rtest_correctly_nested_tabless z5HTMLTreeBuilderSmokeTest.test_correctly_nested_tablescCs(d}|j|}|jdg|jjddS)Nz1
Zcssclass)rrZdiv)r rrr r r(test_deeply_nested_multivalued_attribute0s zAHTMLTreeBuilderSmokeTest.test_deeply_nested_multivalued_attributecCs(d}|j|}|jddg|jddS)Nzarors)rrrL)r rrr r r"test_multivalued_attribute_on_html8s z;HTMLTreeBuilderSmokeTest.test_multivalued_attribute_on_htmlcCs|jdddS)Nzz)r)r r r r3test_angle_brackets_in_attribute_values_are_escaped@szLHTMLTreeBuilderSmokeTest.test_angle_brackets_in_attribute_values_are_escapedcCs|jdddS)Nz$

• AT&T is in the s&p 500

z,

\u2022 AT&T is in the s&p 500

)r)r r r r3test_strings_resembling_character_entity_referencesCszLHTMLTreeBuilderSmokeTest.test_strings_resembling_character_entity_referencescCs"d}|j|}|jd|jjdS)Nz%

“Hello” -☃

u“Hello” -☃)rrrHrp)r rrr r r*test_entities_in_foreign_document_encodingKs zCHTMLTreeBuilderSmokeTest.test_entities_in_foreign_document_encodingcCs8d}|jd||jd||jd||jd|dS)Nu

z

z

z

z

)r)r expectr r r0test_entities_in_attributes_converted_to_unicodeWs    zIHTMLTreeBuilderSmokeTest.test_entities_in_attributes_converted_to_unicodecCs8d}|jd||jd||jd||jd|dS)Nu

piñata

z

piñata

z

piñata

z

piñata

z

piñata

)r)r rzr r r*test_entities_in_text_converted_to_unicode^s    zCHTMLTreeBuilderSmokeTest.test_entities_in_text_converted_to_unicodecCs|jdddS)Nz#

I said "good day!"

z

I said "good day!"

)r)r r r r,test_quot_entity_converted_to_quotation_markeszEHTMLTreeBuilderSmokeTest.test_quot_entity_converted_to_quotation_markcCs,d}|jd||jd||jd|dS)Nu�z�z�z �)r)r rzr r rtest_out_of_range_entityis  z1HTMLTreeBuilderSmokeTest.test_out_of_range_entitycCs<|jd}|jd|jjjj|jd|jj|j|dS)zDMostly to prevent a recurrence of a bug in the html5lib treebuilder.z!

foo

rHN)rrZh2rprr8rHr)r rr r rtest_multipart_stringsos z/HTMLTreeBuilderSmokeTest.test_multipart_stringscCs|jdd|jdddS)zqVerify consistent handling of empty-element tags, no matter how they come in through the markup. z


z


N)r)r r r rr9vs cCs,d}|j|}|jd|jj|j|dS)z8Prevent recurrence of a bug in the html5lib treebuilder.z? foo N)rZassertNotEqualrLbodyr)r contentrr r r#test_head_tag_between_head_and_body}s z N)rrZarticle)r rrr r rtest_multiple_copies_of_a_tags  z6HTMLTreeBuilderSmokeTest.test_multiple_copies_of_a_tagcCs^d}|j|}|j||j|j}|jd|jd|jd|jd|jd|jddS) zParsers don't need to *understand* namespaces, but at the very least they should not choke on namespaces or lose data.s4zhttp://www.w3.org/1999/xhtmlZxmlnsz"http://www.w3.org/1998/Math/MathMLz xmlns:mathmlzhttp://www.w3.org/2000/svgz xmlns:svgN)rrrVrL)r rrrLr r rtest_basic_namespacess z.HTMLTreeBuilderSmokeTest.test_basic_namespacescCs(d}|j|}|jddg|jddS)NsrCbarrs)rrru)r rrr r r-test_multivalued_attribute_value_becomes_lists zFHTMLTreeBuilderSmokeTest.test_multivalued_attribute_value_becomes_listcCs"d}|j|}|jd|jjdS)NuDSacré bleu!u Sacré bleu!)rrrrp)r rrr r rtest_can_parse_unicode_documents z8HTMLTreeBuilderSmokeTest.test_can_parse_unicode_documentcCs*td}|jd|d}|j|jddS)z2Parsers should be able to work with SoupStrainers.roz&A bold statement)Z parse_onlyz boldN)rrrr)r Zstrainerrr r rtest_soupstrainersz*HTMLTreeBuilderSmokeTest.test_soupstrainercCs|jdddS)Nzz)r)r r r r7test_single_quote_attribute_values_become_double_quotesszPHTMLTreeBuilderSmokeTest.test_single_quote_attribute_values_become_double_quotescCsd}|j|dS)Nz'a)r)r rir r r7test_attribute_values_with_nested_quotes_are_left_aloneszPHTMLTreeBuilderSmokeTest.test_attribute_values_with_nested_quotes_are_left_alonecCs.d}|j|}d|jd<|j|jjddS)Nz'azBrawls happen at "Bob's Bar"attrz:a)rrCrr)r rirr r r:test_attribute_values_with_double_nested_quotes_get_quoteds   zSHTMLTreeBuilderSmokeTest.test_attribute_values_with_double_nested_quotes_get_quotedcCs|jdd|jdddS)Nz+z/z.fooz2foo)r)r r r r.test_ampersand_in_attribute_value_gets_escapeds zGHTMLTreeBuilderSmokeTest.test_ampersand_in_attribute_value_gets_escapedcCs|jddS)Nz/)r)r r r r7test_escaped_ampersand_in_attribute_value_is_left_aloneszPHTMLTreeBuilderSmokeTest.test_escaped_ampersand_in_attribute_value_is_left_alonecCsd}d}|j||dS)Nz-

<<sacré bleu!>>

u#

<<sacré bleu!>>

)r)r riexpectedr r r1test_entities_in_strings_converted_during_parsingszJHTMLTreeBuilderSmokeTest.test_entities_in_strings_converted_during_parsingcCs"d}|j|}|j|jjddS)Ns

Foo

u ‘Foo’)rrrHrp)r Zquoterr r r)test_smart_quotes_converted_on_the_way_ins  zBHTMLTreeBuilderSmokeTest.test_smart_quotes_converted_on_the_way_incCs|jd}|j|jjddS)Nz   r:u  )rrrurp)r rr r r0test_non_breaking_spaces_converted_on_the_way_ins zIHTMLTreeBuilderSmokeTest.test_non_breaking_spaces_converted_on_the_way_incCs0d}djd}|j|}|j|jjd|dS)Nz-

<<sacré bleu!>>

u#

<<sacré bleu!>>

zutf-8)rVrrrH)r rirrr r r&test_entities_converted_on_the_way_outs  z?HTMLTreeBuilderSmokeTest.test_entities_converted_on_the_way_outcCsHd}|jd}|j|}|jd}|jdd}|jd}|j||dS)Nu

Sacré bleu!

z iso-8859-1zutf-8z ISO-Latin-1)rVrrWr)r unicode_htmlZiso_latin_htmlrresultrr r rtest_real_iso_latin_documents     z5HTMLTreeBuilderSmokeTest.test_real_iso_latin_documentcCsLd}|jd}|j|}|j|jd|jd|j|jd|jddS)Nsk
Shift-JISŃR[fBOꂽ{̃t@CłB
z shift-jiszutf-8euc_jp)rrrrV)r shift_jis_htmlrrr r rtest_real_shift_jis_documents   z5HTMLTreeBuilderSmokeTest.test_real_shift_jis_documentcCs4d}|j|dd}|j|jd|jdjddS)NsHebrew (ISO 8859-8) in Visual Directionality

Hebrew (ISO 8859-8) in Visual Directionality

z iso8859-8)Z from_encodingzutf-8)rrrVr)r Zhebrew_documentrr r rtest_real_hebrew_document$s  z2HTMLTreeBuilderSmokeTest.test_real_hebrew_documentcCs`d}d|}|j|}|jdddi}|d}|jd||jt|t|jd|jd dS) NzEzj %s Shift-JIS markup goes here.r.z http-equivz Content-typerztext/html; charset=x-sjisztext/html; charset=utf8r[)rrgrrb isinstancerrV)r meta_tagrr parsed_metarr r r'test_meta_tag_reflects_current_encoding1s  z@HTMLTreeBuilderSmokeTest.test_meta_tag_reflects_current_encodingcCs^d}d|}|j|}|jddd}|d}|jd||jt|t|jd|jddS) Nz'zj %s Shift-JIS markup goes here.r.encoding)idcharsetzx-sjisr[)rrgrrbrrrV)r rrrrrr r r3test_html5_style_meta_tag_reflects_current_encodingMs  zLHTMLTreeBuilderSmokeTest.test_html5_style_meta_tag_reflects_current_encodingcCs*|jd}d|jd<|jd|jjdS)Nz textrrCztext)rrurr)r datar r r5test_tag_with_no_attributes_can_have_attributes_addedes  zNHTMLTreeBuilderSmokeTest.test_tag_with_no_attributes_can_have_attributes_addedN);rrr __doc__r9rBrKrDrMrOrPrQrRrSrXrZr\r_r`rarcrdrhrkrlrmrqrrrtrvrwrxryr{r|r}r~rrrrrrrrrrrrrrrrrrrrrrr r r rr"?sr             r"c@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!d"Zd#d$Zd%S)&XMLTreeBuilderSmokeTestcCsF|jd}tj|d}tj|}|j|jt|j|j|jdS)Nz foor:)rr;r<r=rr>rr)r r?r@rAr r rrBls    z9XMLTreeBuilderSmokeTest.test_pickle_and_unpickle_identitycCs|jd}|j|jddS)Nzs. )rrrV)r rr r rtest_docstring_generatedus z0XMLTreeBuilderSmokeTest.test_docstring_generatedcCs$d}|j|}|j||jddS)Ns, r[)rrrV)r rrr r rtest_xml_declarationzs z,XMLTreeBuilderSmokeTest.test_xml_declarationcCs$d}|j|}|j||jddS)Ns< r[)rrrV)r rrr r rr\s z3XMLTreeBuilderSmokeTest.test_processing_instructioncCs$d}|j|}|j|jd|dS)zGA real XHTML document should come out *exactly* the same as it went in.s Hello. Goodbye. zutf-8N)rrrV)r rrr r rrXs z0XMLTreeBuilderSmokeTest.test_real_xhtml_documentcCs"d}|j|}|j||jdS)Ns )rrrV)r docrr r rtest_nested_namespacess z.XMLTreeBuilderSmokeTest.test_nested_namespacescCs0d}t|d}d|j_|j}|jd|kdS)Nz/ zlxml-xmlzconsole.log("< < hey > > ");s< < hey > >)rrerprVrb)r rrZencodedr r r5test_formatter_processes_script_tag_for_xml_documentss  zMXMLTreeBuilderSmokeTest.test_formatter_processes_script_tag_for_xml_documentscCs"d}|j|}|jd|jjdS)Nu?Sacré bleu!u Sacré bleu!)rrrootrp)r rrr r rrs z7XMLTreeBuilderSmokeTest.test_can_parse_unicode_documentcCs$d}|j|}|jt|j|dS)Nzb2012-07-02T20:33:42Zcd)rrrFZrss)r rrr r rtest_popping_namespaced_tags z3XMLTreeBuilderSmokeTest.test_popping_namespaced_tagcCs |jd}|j|jdddS)Nzlatin1s/ )rrrV)r rr r r(test_docstring_includes_correct_encodings z@XMLTreeBuilderSmokeTest.test_docstring_includes_correct_encodingcCs0dddd}|j|}|j|jd|dS) z 0r: szutf-8Ni)rrrV)r rrr r rtest_large_xml_documents z/XMLTreeBuilderSmokeTest.test_large_xml_documentcCs|jdd|jddS)Nz

z

z

foo

)r)r r r r9test_tags_are_empty_element_if_and_only_if_they_are_emptys zQXMLTreeBuilderSmokeTest.test_tags_are_empty_element_if_and_only_if_they_are_emptycCs8d}|j|}|j}|jd|d|jd|ddS)NzThis tag is in the a namespaceThis tag is in the b namespacezhttp://example.com/zxmlns:azhttp://example.net/zxmlns:b)rrr)r rrrr r rtest_namespaces_are_preserveds  z5XMLTreeBuilderSmokeTest.test_namespaces_are_preservedcCs$d}|j|}|jt|j|dS)NzN

20010504

)rrrFrH)r rrr r rtest_closing_namespaced_tags z3XMLTreeBuilderSmokeTest.test_closing_namespaced_tagcCs$d}|j|}|jt|j|dS)Nzs)rrrFrC)r rrr r rtest_namespaced_attributess z2XMLTreeBuilderSmokeTest.test_namespaced_attributescCs$d}|j|}|jt|j|dS)Nzbar)rrrFrC)r rrr r r(test_namespaced_attributes_xml_namespaces z@XMLTreeBuilderSmokeTest.test_namespaced_attributes_xml_namespacecCsd}|j|}|jdt|jd|jdt|jd|jdt|jd|jdt|jddd |jdt|jddgdS) Na foo bar baz tagr:zns1:tagzns2:tagvalue)key)rrrGrY)r rrr r rtest_find_by_prefixed_names  z2XMLTreeBuilderSmokeTest.test_find_by_prefixed_namecCs2d}|j|}|j}tj|}|j|j|jdS)Nzf )rZdocumentr]rprefix)r ZxmlrrZ duplicater r r!test_copy_tag_preserves_namespaces   z9XMLTreeBuilderSmokeTest.test_copy_tag_preserves_namespaceN)rrr rBrrr\rXrrrrrrrrrrrrrr r r rrjs$     rc@s8eZdZdZddZddZddZdd Zd d Zd S) HTML5TreeBuilderSmokeTestz2Smoke test for a tree builder that supports HTML5.cCsdS)Nr )r r r rrXsz2HTML5TreeBuilderSmokeTest.test_real_xhtml_documentcCs"d}|j|}|jd|jjdS)Nz
zhttp://www.w3.org/1999/xhtml)rrru namespace)r rrr r rtest_html_tags_have_namespaces z7HTML5TreeBuilderSmokeTest.test_html_tags_have_namespacecCs6d}|j|}d}|j||jj|j||jjdS)Nzzhttp://www.w3.org/2000/svg)rrZsvgrZcircle)r rrrr r rtest_svg_tags_have_namespace s  z6HTML5TreeBuilderSmokeTest.test_svg_tags_have_namespacecCs6d}|j|}d}|j||jj|j||jjdS)Nz5z"http://www.w3.org/1998/Math/MathML)rrZmathrZmsqrt)r rrrr r rtest_mathml_tags_have_namespaces  z9HTML5TreeBuilderSmokeTest.test_mathml_tags_have_namespacecCsPd}|j|}|jt|jdt|j|jdd|jd|jdjjdS)Nz3rz$?xml version="1.0" encoding="utf-8"?rL)rrbrrErrrr8)r rrr r r$test_xml_declaration_becomes_comments  z>HTML5TreeBuilderSmokeTest.test_xml_declaration_becomes_commentN) rrr rrXrrrrr r r rrs rcsddfdd}|S)Nc_sdS)Nr )Ztestargsrr r rnothing!szskipIf..nothingcsrS|SdS)Nr )Z test_item) conditionrr r decorator$szskipIf..decoratorr )rreasonrr )rrrskipIf sr)rZ __license__r;r] functoolsZunittestrZbs4rZ bs4.elementrrrrrZ bs4.builderr r r objectr"rrrr r r rs(   %/#PK!ϫ"__pycache__/testing.cpython-36.pycnu[3 "6]~@sdZdZddlZddlZddlZddlZddlmZddlmZddl m Z m Z m Z m Z mZddlmZeZGdd d ejZGd d d eZGd d d eZGdddeZddZdS)zHelper classes for tests.ZMITN)TestCase) BeautifulSoup)CharsetMetaAttributeValueCommentContentMetaAttributeValueDoctype SoupStrainer)HTMLParserTreeBuilderc@s:eZdZeddZddZddZd dd Zd d ZdS) SoupTestcCstS)N)default_builder)selfr /usr/lib/python3.6/testing.pyr szSoupTest.default_buildercKs"|jd|j}t|fd|i|S)z*Build a Beautiful Soup object from markup.builder)popr r)r markupkwargsrr r rsoup sz SoupTest.soupcCs |jj|S)z[Turn an HTML fragment into a document. The details depend on the builder. )r Ztest_fragment_to_document)r rr r r document_for%szSoupTest.document_forNcCs8|j}t||d}|dkr|}|j|j|j|dS)N)r)r r assertEqualdecoder)r Zto_parseZcompare_parsed_torobjr r rassertSoupEquals,s  zSoupTest.assertSoupEqualscCs<d}x2|jD](}|r0|j||j|j||j|}q WdS)zyEnsure that next_element and previous_element are properly set for all descendants of the given element. N)Z descendantsr next_elementprevious_element)r elementZearlierer r rassertConnectedness4s  zSoupTest.assertConnectedness)N) __name__ __module__ __qualname__propertyr rrrrr r r rr s   r c@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZddZddZd d!Zd"d#Zd$d%Zd&d'Zd(d)Zd*d+Zd,d-Zd.d/Zd0d1Zd2d3Zd4d5Zd6d7Zd8d9Zd:d;Z dd?Z"d@dAZ#dBdCZ$dDdEZ%dFdZdGdHZ&dIdJZ'dKdLZ(dMdNZ)dOdPZ*dQdRZ+dSdTZ,dUdVZ-dWdXZ.dYdZZ/d[d\Z0d]d^Z1d_d`Z2dadbZ3dcddZ4dedfZ5dgdhZ6didjZ7dkdlZ8dmdnZ9dodpZ:dqS)rHTMLTreeBuilderSmokeTestaCA basic test of a treebuilder's competence. Any HTML treebuilder, present or future, should be able to pass these tests. With invalid markup, there's room for interpretation, and different parsers can handle it differently. But with the markup in these tests, there's not much room for interpretation. cCs4x.dD]&}|jd}|j|}|jd|jqWdS)zmVerify that all HTML4 and HTML5 empty element (aka void element) tags are handled correctly. areabasebrcolembedhrimginputkeygenlinkmenuitemmetaparamsourcetrackwbrspacerframeTN)r#r$r%r&r'r(r)r*r+r,r-r.r/r0r1r2r3r4)rnew_tagris_empty_element)r namerr6r r rtest_empty_element_tagsIs   z0HTMLTreeBuilderSmokeTest.test_empty_element_tagscCsF|jd}tj|d}tj|}|j|jt|j|j|jdS)Nz foo)rpickledumpsloadsr __class__rr)r treedumpedloadedr r r!test_pickle_and_unpickle_identityUs    z:HTMLTreeBuilderSmokeTest.test_pickle_and_unpickle_identitycCsf|j|\}}|jd}|j|jt|j|||jt|dt|||j|jjdddS)z8Assert that a given doctype string is handled correctly.rNfoo)_document_with_doctypecontentsrr>rstrlenp)r doctype_fragmentZ doctype_strrdoctyper r rassertDoctypeHandled^s   z-HTMLTreeBuilderSmokeTest.assertDoctypeHandledcCs"d|}|d}|j|}||fS)z5Generate and parse a document with the given doctype.z z

foo

)r)r rIrJrrr r rrDls z/HTMLTreeBuilderSmokeTest._document_with_doctypecCs|jd|jddS)z?Make sure normal, everyday HTML doctypes are handled correctly.htmlz4html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"N)rK)r r r rtest_normal_doctypesss z-HTMLTreeBuilderSmokeTest.test_normal_doctypescCs(|jd}|jd}|jd|jdS)Nz rr5)rrErstrip)r rrJr r rtest_empty_doctypeys  z+HTMLTreeBuilderSmokeTest.test_empty_doctypecCsd}|j|dS)Nznhtml PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd")rK)r rJr r rtest_public_doctype_with_url~sz5HTMLTreeBuilderSmokeTest.test_public_doctype_with_urlcCs|jddS)Nz$foo SYSTEM "http://www.example.com/")rK)r r r rtest_system_doctypesz,HTMLTreeBuilderSmokeTest.test_system_doctypecCs|jddS)Nz#xsl:stylesheet SYSTEM "htmlent.dtd")rK)r r r rtest_namespaced_system_doctypesz7HTMLTreeBuilderSmokeTest.test_namespaced_system_doctypecCs|jddS)Nz#xsl:stylesheet PUBLIC "htmlent.dtd")rK)r r r rtest_namespaced_public_doctypesz7HTMLTreeBuilderSmokeTest.test_namespaced_public_doctypecCs4d}|j|}|j|jdjdd|jdddS)zJA real XHTML document should come out more or less the same as it went in.s Hello. Goodbye. zutf-8 N)rrencodereplace)r rrr r rtest_real_xhtml_documents  z1HTMLTreeBuilderSmokeTest.test_real_xhtml_documentcCs(d}|j|}|jdt|jddS)ztWhen a namespaced XML document is parsed as HTML it should be treated as HTML with weird tag names. s.contentr:zns1:fooN)rrrGfind_all)r rrr r rtest_namespaced_htmls z-HTMLTreeBuilderSmokeTest.test_namespaced_htmlcCsBd}|j|}|j||jd}|j|}|j||jddS)Nzsutf8)rrrrV)r rrr r rtest_processing_instructions   z4HTMLTreeBuilderSmokeTest.test_processing_instructioncCstj|jdS)zMake sure you can copy the tree builder. This is important because the builder is part of a BeautifulSoup object, and we want to be able to copy that. N)copydeepcopyr )r r r r test_deepcopysz&HTMLTreeBuilderSmokeTest.test_deepcopycCs.|jd}|j|jj|jt|jddS)zA

tag is never designated as an empty-element tag. Even if the markup shows it as an empty-element tag, it shouldn't be presented that way. z

z

N)rZ assertFalserHr7rrF)r rr r r!test_p_tag_is_never_empty_elements z:HTMLTreeBuilderSmokeTest.test_p_tag_is_never_empty_elementcCs(|jdd|jdd|jdddS)zA tag that's not closed by the end of the document should be closed. This applies to all tags except empty-element tags. z

z

zzz
z
N)r)r r r rtest_unclosed_tags_get_closeds  z6HTMLTreeBuilderSmokeTest.test_unclosed_tags_get_closedcCs.|jd}|j|jj|jt|jddS)zA
tag is designated as an empty-element tag. Some parsers treat

as one
tag, some parsers as two tags, but it should always be an empty-element tag. z

z
N)r assertTruer%r7rrF)r rr r r#test_br_is_always_empty_element_tags z)r)r r r rtest_nested_formatting_elementssz8HTMLTreeBuilderSmokeTest.test_nested_formatting_elementscCs(d}|j|}|jd|jdddS)Nz Ordinary HEAD element test Hello, world! ztext/javascriptscripttype)rrfind)r rLrr r rtest_double_heads  z)HTMLTreeBuilderSmokeTest.test_double_headcCsjd}|j||j|}|jdd}|j|jt|jdd}|j||j|jdd}|j||jdS)Nz

foobaz

Zfoobar)textrCbaz)rrrgrr>rrr)r rrZcommentrCrjr r r test_comments     z%HTMLTreeBuilderSmokeTest.test_commentcCstd}d}|j||j||j|}|j|jj||j|}|j|jj||jd}|j|jjddS)zWhitespace must be preserved in
 and zN)rrrZpreZprettifyZtextarea)rZ
pre_markupZtextarea_markuprr
r
r-test_preserved_whitespace_in_pre_and_textareas




zFHTMLTreeBuilderSmokeTest.test_preserved_whitespace_in_pre_and_textareacCs.d}|j|d}|j|d}|j|dS)z+Inline elements can be nested indefinitely.zInside a B tagz!

A nested tag

z/

A doubly nested tag

N)r)r Zb_tagZ nested_b_tagZdouble_nested_b_tagr r rtest_nested_inline_elements s   z4HTMLTreeBuilderSmokeTest.test_nested_inline_elementscCs6|jd}|j}|j|jjjd|j|jjddS)zBlock elements can be nested.z*

Foo

ZFooN)r blockquoterrHbstring)r rrnr r r test_nested_block_level_elementss z9HTMLTreeBuilderSmokeTest.test_nested_block_level_elementscCsd}|j|d|jddS)z$One table can go inside another one.z[zh
Here's another table:
foo
Here's another table:
foo
z{
Foo
Bar
Baz
N)r)r rr r rtest_correctly_nested_tabless z5HTMLTreeBuilderSmokeTest.test_correctly_nested_tablescCs(d}|j|}|jdg|jjddS)Nz1
Zcssclass)rrZdiv)r rrr r r(test_deeply_nested_multivalued_attribute0s zAHTMLTreeBuilderSmokeTest.test_deeply_nested_multivalued_attributecCs(d}|j|}|jddg|jddS)Nzarors)rrrL)r rrr r r"test_multivalued_attribute_on_html8s z;HTMLTreeBuilderSmokeTest.test_multivalued_attribute_on_htmlcCs|jdddS)Nzz)r)r r r r3test_angle_brackets_in_attribute_values_are_escaped@szLHTMLTreeBuilderSmokeTest.test_angle_brackets_in_attribute_values_are_escapedcCs|jdddS)Nz$

• AT&T is in the s&p 500

z,

\u2022 AT&T is in the s&p 500

)r)r r r r3test_strings_resembling_character_entity_referencesCszLHTMLTreeBuilderSmokeTest.test_strings_resembling_character_entity_referencescCs"d}|j|}|jd|jjdS)Nz%

“Hello” -☃

u“Hello” -☃)rrrHrp)r rrr r r*test_entities_in_foreign_document_encodingKs zCHTMLTreeBuilderSmokeTest.test_entities_in_foreign_document_encodingcCs8d}|jd||jd||jd||jd|dS)Nu

z

z

z

z

)r)r expectr r r0test_entities_in_attributes_converted_to_unicodeWs    zIHTMLTreeBuilderSmokeTest.test_entities_in_attributes_converted_to_unicodecCs8d}|jd||jd||jd||jd|dS)Nu

piñata

z

piñata

z

piñata

z

piñata

z

piñata

)r)r rzr r r*test_entities_in_text_converted_to_unicode^s    zCHTMLTreeBuilderSmokeTest.test_entities_in_text_converted_to_unicodecCs|jdddS)Nz#

I said "good day!"

z

I said "good day!"

)r)r r r r,test_quot_entity_converted_to_quotation_markeszEHTMLTreeBuilderSmokeTest.test_quot_entity_converted_to_quotation_markcCs,d}|jd||jd||jd|dS)Nu�z�z�z �)r)r rzr r rtest_out_of_range_entityis  z1HTMLTreeBuilderSmokeTest.test_out_of_range_entitycCs<|jd}|jd|jjjj|jd|jj|j|dS)zDMostly to prevent a recurrence of a bug in the html5lib treebuilder.z!

foo

rHN)rrZh2rprr8rHr)r rr r rtest_multipart_stringsos z/HTMLTreeBuilderSmokeTest.test_multipart_stringscCs|jdd|jdddS)zqVerify consistent handling of empty-element tags, no matter how they come in through the markup. z


z


N)r)r r r rr9vs cCs,d}|j|}|jd|jj|j|dS)z8Prevent recurrence of a bug in the html5lib treebuilder.z? foo N)rZassertNotEqualrLbodyr)r contentrr r r#test_head_tag_between_head_and_body}s z N)rrZarticle)r rrr r rtest_multiple_copies_of_a_tags  z6HTMLTreeBuilderSmokeTest.test_multiple_copies_of_a_tagcCs^d}|j|}|j||j|j}|jd|jd|jd|jd|jd|jddS) zParsers don't need to *understand* namespaces, but at the very least they should not choke on namespaces or lose data.s4zhttp://www.w3.org/1999/xhtmlZxmlnsz"http://www.w3.org/1998/Math/MathMLz xmlns:mathmlzhttp://www.w3.org/2000/svgz xmlns:svgN)rrrVrL)r rrrLr r rtest_basic_namespacess z.HTMLTreeBuilderSmokeTest.test_basic_namespacescCs(d}|j|}|jddg|jddS)NsrCbarrs)rrru)r rrr r r-test_multivalued_attribute_value_becomes_lists zFHTMLTreeBuilderSmokeTest.test_multivalued_attribute_value_becomes_listcCs"d}|j|}|jd|jjdS)NuDSacré bleu!u Sacré bleu!)rrrrp)r rrr r rtest_can_parse_unicode_documents z8HTMLTreeBuilderSmokeTest.test_can_parse_unicode_documentcCs*td}|jd|d}|j|jddS)z2Parsers should be able to work with SoupStrainers.roz&A bold statement)Z parse_onlyz boldN)rrrr)r Zstrainerrr r rtest_soupstrainersz*HTMLTreeBuilderSmokeTest.test_soupstrainercCs|jdddS)Nzz)r)r r r r7test_single_quote_attribute_values_become_double_quotesszPHTMLTreeBuilderSmokeTest.test_single_quote_attribute_values_become_double_quotescCsd}|j|dS)Nz'a)r)r rir r r7test_attribute_values_with_nested_quotes_are_left_aloneszPHTMLTreeBuilderSmokeTest.test_attribute_values_with_nested_quotes_are_left_alonecCs.d}|j|}d|jd<|j|jjddS)Nz'azBrawls happen at "Bob's Bar"attrz:a)rrCrr)r rirr r r:test_attribute_values_with_double_nested_quotes_get_quoteds   zSHTMLTreeBuilderSmokeTest.test_attribute_values_with_double_nested_quotes_get_quotedcCs|jdd|jdddS)Nz+z/z.fooz2foo)r)r r r r.test_ampersand_in_attribute_value_gets_escapeds zGHTMLTreeBuilderSmokeTest.test_ampersand_in_attribute_value_gets_escapedcCs|jddS)Nz/)r)r r r r7test_escaped_ampersand_in_attribute_value_is_left_aloneszPHTMLTreeBuilderSmokeTest.test_escaped_ampersand_in_attribute_value_is_left_alonecCsd}d}|j||dS)Nz-

<<sacré bleu!>>

u#

<<sacré bleu!>>

)r)r riexpectedr r r1test_entities_in_strings_converted_during_parsingszJHTMLTreeBuilderSmokeTest.test_entities_in_strings_converted_during_parsingcCs"d}|j|}|j|jjddS)Ns

Foo

u ‘Foo’)rrrHrp)r Zquoterr r r)test_smart_quotes_converted_on_the_way_ins  zBHTMLTreeBuilderSmokeTest.test_smart_quotes_converted_on_the_way_incCs|jd}|j|jjddS)Nz   r:u  )rrrurp)r rr r r0test_non_breaking_spaces_converted_on_the_way_ins zIHTMLTreeBuilderSmokeTest.test_non_breaking_spaces_converted_on_the_way_incCs0d}djd}|j|}|j|jjd|dS)Nz-

<<sacré bleu!>>

u#

<<sacré bleu!>>

zutf-8)rVrrrH)r rirrr r r&test_entities_converted_on_the_way_outs  z?HTMLTreeBuilderSmokeTest.test_entities_converted_on_the_way_outcCsHd}|jd}|j|}|jd}|jdd}|jd}|j||dS)Nu

Sacré bleu!

z iso-8859-1zutf-8z ISO-Latin-1)rVrrWr)r unicode_htmlZiso_latin_htmlrresultrr r rtest_real_iso_latin_documents     z5HTMLTreeBuilderSmokeTest.test_real_iso_latin_documentcCsLd}|jd}|j|}|j|jd|jd|j|jd|jddS)Nsk
Shift-JISŃR[fBOꂽ{̃t@CłB
z shift-jiszutf-8euc_jp)rrrrV)r shift_jis_htmlrrr r rtest_real_shift_jis_documents   z5HTMLTreeBuilderSmokeTest.test_real_shift_jis_documentcCsBd}|j|dd}|jdks t|j|jd|jdjddS)NsHebrew (ISO 8859-8) in Visual Directionality

Hebrew (ISO 8859-8) in Visual Directionality

iso8859-8)Z from_encoding iso-8859-8zutf-8)rr)rZoriginal_encodingAssertionErrorrrVr)r Zhebrew_documentrr r rtest_real_hebrew_document$s z2HTMLTreeBuilderSmokeTest.test_real_hebrew_documentcCs`d}d|}|j|}|jdddi}|d}|jd||jt|t|jd|jd dS) NzEzj %s Shift-JIS markup goes here.r.z http-equivz Content-typerztext/html; charset=x-sjisztext/html; charset=utf8r[)rrgrrb isinstancerrV)r meta_tagrr parsed_metarr r r'test_meta_tag_reflects_current_encoding1s  z@HTMLTreeBuilderSmokeTest.test_meta_tag_reflects_current_encodingcCs^d}d|}|j|}|jddd}|d}|jd||jt|t|jd|jddS) Nz'zj %s Shift-JIS markup goes here.r.encoding)idcharsetzx-sjisr[)rrgrrbrrrV)r rrrrrr r r3test_html5_style_meta_tag_reflects_current_encodingMs  zLHTMLTreeBuilderSmokeTest.test_html5_style_meta_tag_reflects_current_encodingcCs*|jd}d|jd<|jd|jjdS)Nz textrrCztext)rrurr)r datar r r5test_tag_with_no_attributes_can_have_attributes_addedes  zNHTMLTreeBuilderSmokeTest.test_tag_with_no_attributes_can_have_attributes_addedN);rrr __doc__r9rBrKrDrMrOrPrQrRrSrXrZr\r_r`rarcrdrhrkrlrmrqrrrtrvrwrxryr{r|r}r~rrrrrrrrrrrrrrrrrrrrrrr r r rr"?sr             r"c@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!d"Zd#d$Zd%S)&XMLTreeBuilderSmokeTestcCsF|jd}tj|d}tj|}|j|jt|j|j|jdS)Nz foor:)rr;r<r=rr>rr)r r?r@rAr r rrBls    z9XMLTreeBuilderSmokeTest.test_pickle_and_unpickle_identitycCs|jd}|j|jddS)Nzs. )rrrV)r rr r rtest_docstring_generatedus z0XMLTreeBuilderSmokeTest.test_docstring_generatedcCs$d}|j|}|j||jddS)Ns, r[)rrrV)r rrr r rtest_xml_declarationzs z,XMLTreeBuilderSmokeTest.test_xml_declarationcCs$d}|j|}|j||jddS)Ns< r[)rrrV)r rrr r rr\s z3XMLTreeBuilderSmokeTest.test_processing_instructioncCs$d}|j|}|j|jd|dS)zGA real XHTML document should come out *exactly* the same as it went in.s Hello. Goodbye. zutf-8N)rrrV)r rrr r rrXs z0XMLTreeBuilderSmokeTest.test_real_xhtml_documentcCs"d}|j|}|j||jdS)Ns )rrrV)r docrr r rtest_nested_namespacess z.XMLTreeBuilderSmokeTest.test_nested_namespacescCs0d}t|d}d|j_|j}|jd|kdS)Nz/ zlxml-xmlzconsole.log("< < hey > > ");s< < hey > >)rrerprVrb)r rrZencodedr r r5test_formatter_processes_script_tag_for_xml_documentss  zMXMLTreeBuilderSmokeTest.test_formatter_processes_script_tag_for_xml_documentscCs"d}|j|}|jd|jjdS)Nu?Sacré bleu!u Sacré bleu!)rrrootrp)r rrr r rrs z7XMLTreeBuilderSmokeTest.test_can_parse_unicode_documentcCs$d}|j|}|jt|j|dS)Nzb2012-07-02T20:33:42Zcd)rrrFZrss)r rrr r rtest_popping_namespaced_tags z3XMLTreeBuilderSmokeTest.test_popping_namespaced_tagcCs |jd}|j|jdddS)Nzlatin1s/ )rrrV)r rr r r(test_docstring_includes_correct_encodings z@XMLTreeBuilderSmokeTest.test_docstring_includes_correct_encodingcCs0dddd}|j|}|j|jd|dS) z 0r: szutf-8Ni)rrrV)r rrr r rtest_large_xml_documents z/XMLTreeBuilderSmokeTest.test_large_xml_documentcCs|jdd|jddS)Nz

z

z

foo

)r)r r r r9test_tags_are_empty_element_if_and_only_if_they_are_emptys zQXMLTreeBuilderSmokeTest.test_tags_are_empty_element_if_and_only_if_they_are_emptycCs8d}|j|}|j}|jd|d|jd|ddS)NzThis tag is in the a namespaceThis tag is in the b namespacezhttp://example.com/zxmlns:azhttp://example.net/zxmlns:b)rrr)r rrrr r rtest_namespaces_are_preserveds  z5XMLTreeBuilderSmokeTest.test_namespaces_are_preservedcCs$d}|j|}|jt|j|dS)NzN

20010504

)rrrFrH)r rrr r rtest_closing_namespaced_tags z3XMLTreeBuilderSmokeTest.test_closing_namespaced_tagcCs$d}|j|}|jt|j|dS)Nzs)rrrFrC)r rrr r rtest_namespaced_attributess z2XMLTreeBuilderSmokeTest.test_namespaced_attributescCs$d}|j|}|jt|j|dS)Nzbar)rrrFrC)r rrr r r(test_namespaced_attributes_xml_namespaces z@XMLTreeBuilderSmokeTest.test_namespaced_attributes_xml_namespacecCsd}|j|}|jdt|jd|jdt|jd|jdt|jd|jdt|jddd |jdt|jddgdS) Na foo bar baz tagr:zns1:tagzns2:tagvalue)key)rrrGrY)r rrr r rtest_find_by_prefixed_names  z2XMLTreeBuilderSmokeTest.test_find_by_prefixed_namecCs2d}|j|}|j}tj|}|j|j|jdS)Nzf )rZdocumentr]rprefix)r ZxmlrrZ duplicater r r!test_copy_tag_preserves_namespaces   z9XMLTreeBuilderSmokeTest.test_copy_tag_preserves_namespaceN)rrr rBrrr\rXrrrrrrrrrrrrrr r r rrjs$     rc@s8eZdZdZddZddZddZdd Zd d Zd S) HTML5TreeBuilderSmokeTestz2Smoke test for a tree builder that supports HTML5.cCsdS)Nr )r r r rrXsz2HTML5TreeBuilderSmokeTest.test_real_xhtml_documentcCs"d}|j|}|jd|jjdS)Nz
zhttp://www.w3.org/1999/xhtml)rrru namespace)r rrr r rtest_html_tags_have_namespaces z7HTML5TreeBuilderSmokeTest.test_html_tags_have_namespacecCs6d}|j|}d}|j||jj|j||jjdS)Nzzhttp://www.w3.org/2000/svg)rrZsvgrZcircle)r rrrr r rtest_svg_tags_have_namespace s  z6HTML5TreeBuilderSmokeTest.test_svg_tags_have_namespacecCs6d}|j|}d}|j||jj|j||jjdS)Nz5z"http://www.w3.org/1998/Math/MathML)rrZmathrZmsqrt)r rrrr r rtest_mathml_tags_have_namespaces  z9HTML5TreeBuilderSmokeTest.test_mathml_tags_have_namespacecCsPd}|j|}|jt|jdt|j|jdd|jd|jdjjdS)Nz3rz$?xml version="1.0" encoding="utf-8"?rL)rrbrrErrrr8)r rrr r r$test_xml_declaration_becomes_comments  z>HTML5TreeBuilderSmokeTest.test_xml_declaration_becomes_commentN) rrr rrXrrrrr r r rrs rcsddfdd}|S)Nc_sdS)Nr )Ztestargsrr r rnothing!szskipIf..nothingcsrS|SdS)Nr )Z test_item) conditionrr r decorator$szskipIf..decoratorr )rreasonrr )rrrskipIf sr)rZ __license__r;r] functoolsZunittestrZbs4rZ bs4.elementrrrrrZ bs4.builderr r r objectr"rrrr r r rs(   %/#PK!+S0#0#1builder/__pycache__/__init__.cpython-36.opt-1.pycnu[3 6]/@s,ddlmZddlZddlZddlmZmZmZmZddddgZ dZ d Z d Z d Z d Zd ZGdddeZeZGdddeZGdddeZGdddeZddZGdddeZddlmZeeyddlmZeeWnek rYnXyddlmZeeWnek r&YnXdS)) defaultdictN)CharsetMetaAttributeValueContentMetaAttributeValueHTMLAwareEntitySubstitution whitespace_reHTMLTreeBuilderSAXTreeBuilder TreeBuilderTreeBuilderRegistryZfastZ permissivestrictZxmlZhtmlZhtml5c@s$eZdZddZddZddZdS)r cCstt|_g|_dS)N)rlistbuilders_for_featurebuilders)selfr/usr/lib/python3.6/__init__.py__init__ s zTreeBuilderRegistry.__init__cCs4x |jD]}|j|jd|qW|jjd|dS)z8Register a treebuilder based on its advertised features.rN)featuresr insertr)rZtreebuilder_classfeaturerrrregister$s zTreeBuilderRegistry.registercGst|jdkrdSt|dkr(|jdSt|}|jd}d}xVt|dkr|j}|jj|g}t|dkrB|dkr|}t|}qB|jt|}qBW|dkrdSx|D]}||kr|SqWdS)Nr) lenrr reversepopr getset intersection)rrZ candidatesZ candidate_setrZwe_have_the_feature candidaterrrlookup*s.     zTreeBuilderRegistry.lookupN)__name__ __module__ __qualname__rrrrrrrr sc@steZdZdZdZgZgZdZdZe Z dZ iZ ddZ ddZd d Zd d Zdd dZddZddZddZdS)r z2Turn a document into a Beautiful Soup object tree.z[Unknown tree builder]FNcCs d|_dS)N)soup)rrrrrfszTreeBuilder.__init__cCsdS)Nr)rrrrresetiszTreeBuilder.resetcCs|jdkrdS||jkS)aMight a tag with this name be an empty-element tag? The final markup may or may not actually present this tag as self-closing. For instance: an HTMLBuilder does not consider a

tag to be an empty-element tag (it's not in HTMLBuilder.empty_element_tags). This means an empty

tag will be presented as "

", not "

". The default implementation has no opinion about which tags are empty-element tags, so a tag will be presented as an empty-element tag if and only if it has no contents. "" will become "", and "bar" will be left alone. NT)empty_element_tags)rtag_namerrrcan_be_empty_elementls z TreeBuilder.can_be_empty_elementcCs tdS)N)NotImplementedError)rmarkuprrrfeedszTreeBuilder.feedcCs |dddfS)NFr)rr(Zuser_specified_encodingZdocument_declared_encodingrrrprepare_markupszTreeBuilder.prepare_markupcCs|S)aWrap an HTML fragment to make it look like a document. Different parsers do this differently. For instance, lxml introduces an empty tag, and html5lib doesn't. Abstracting this away lets us write simple tests which run HTML fragments through the parser and compare the results against other HTML fragments. This method should not be used outside of tests. r)rZfragmentrrrtest_fragment_to_documents z%TreeBuilder.test_fragment_to_documentcCsdS)NFr)rtagrrrset_up_substitutionssz TreeBuilder.set_up_substitutionscCs|s|S|jr|jjdg}|jj|jd}xRt|jD]B}||ksT|r<||kr<||}t|trrtj|}n|}|||<qsz.SAXTreeBuilder.startElement..)dictr itemsr"Zhandle_starttag)rnamer5rrr startElementszSAXTreeBuilder.startElementcCs|jj|dS)N)r"Z handle_endtag)rrDrrr endElementszSAXTreeBuilder.endElementcCs|j||dS)N)rE)rnsTuplenodeNamer5rrrstartElementNSszSAXTreeBuilder.startElementNScCs|j|dS)N)rF)rrGrHrrr endElementNSszSAXTreeBuilder.endElementNScCsdS)Nr)rprefixZ nodeValuerrrstartPrefixMappingsz!SAXTreeBuilder.startPrefixMappingcCsdS)Nr)rrKrrrendPrefixMappingszSAXTreeBuilder.endPrefixMappingcCs|jj|dS)N)r"Z handle_data)rcontentrrr charactersszSAXTreeBuilder.characterscCsdS)Nr)rrrr startDocumentszSAXTreeBuilder.startDocumentcCsdS)Nr)rrrr endDocumentszSAXTreeBuilder.endDocumentN)rr r!r:r)r=rErFrIrJrLrMrOrPrQrrrrrsc$@seZdZdZejZedddddddd d d d d ddddddddddddgZeddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.dd/d0d1d2d3d4d5d6d7d8d9d:d;g#Zdgd?d@gd?d@gdAgdAgdAgdBgdCgd?gdDgdEgdFgdG Z dHdIZ dJS)Krz]This TreeBuilder knows facts about HTML. Such as which tags are empty-element tags. areabasebrcolZembedZhrZimginputZkeygenlinkZmenuitemmetaZparamsourceZtrackZwbrZbasefontZbgsoundZcommandframeZimageZisindexZnextidZspacerZaddressZarticleZasideZ blockquoteZcanvasZddZdivZdlZdtZfieldsetZ figcaptionZfigureZfooterformZh1Zh2Zh3Zh4Zh5Zh6headerZlimainZnavZnoscriptZoloutputpZpreZsectiontableZtfootZulZvideoclassZ accesskeyZdropzoneZrelZrevZheaderszaccept-charsetarchiveZsizesZsandboxfor) r.arWtdZthrer[objectrRZiconZiframer^cCsz|jdkrdS|jd}|jd}|jd}d}|dk rJ|}t||d<n(|dk rr|dk rr|jdkrrt||d<|dk S)NrXFz http-equivrNcharsetz content-type)rDrrr0r)rr,Z http_equivrNrgZ meta_encodingrrrr-s      z$HTMLTreeBuilder.set_up_substitutionsN) rr r!r:rr<rr$Zblock_elementsr/r-rrrrrs& N  cCsVtjd}xF|jD]<}t||}t|trt||||jj||jj |qWdS)z9Copy TreeBuilders from the given module into this module.z bs4.builderN) sysmodules__all__getattr issubclassr setattrappendbuilder_registryr)moduleZ this_modulerDobjrrrregister_treebuilders_from2s      rrc@s eZdZdS)ParserRejectedMarkupN)rr r!rrrrrs?srsr>) _htmlparser) _html5lib)_lxml) collectionsr itertoolsrhZ bs4.elementrrrrrjZFASTZ PERMISSIVEZSTRICTZXMLZHTMLZHTML_5rfr ror rrrr Exceptionrsrtru ImportErrorrvrrrrs@ 4b.N      PK!+S0#0#+builder/__pycache__/__init__.cpython-36.pycnu[3 6]/@s,ddlmZddlZddlZddlmZmZmZmZddddgZ dZ d Z d Z d Z d Zd ZGdddeZeZGdddeZGdddeZGdddeZddZGdddeZddlmZeeyddlmZeeWnek rYnXyddlmZeeWnek r&YnXdS)) defaultdictN)CharsetMetaAttributeValueContentMetaAttributeValueHTMLAwareEntitySubstitution whitespace_reHTMLTreeBuilderSAXTreeBuilder TreeBuilderTreeBuilderRegistryZfastZ permissivestrictZxmlZhtmlZhtml5c@s$eZdZddZddZddZdS)r cCstt|_g|_dS)N)rlistbuilders_for_featurebuilders)selfr/usr/lib/python3.6/__init__.py__init__ s zTreeBuilderRegistry.__init__cCs4x |jD]}|j|jd|qW|jjd|dS)z8Register a treebuilder based on its advertised features.rN)featuresr insertr)rZtreebuilder_classfeaturerrrregister$s zTreeBuilderRegistry.registercGst|jdkrdSt|dkr(|jdSt|}|jd}d}xVt|dkr|j}|jj|g}t|dkrB|dkr|}t|}qB|jt|}qBW|dkrdSx|D]}||kr|SqWdS)Nr) lenrr reversepopr getset intersection)rrZ candidatesZ candidate_setrZwe_have_the_feature candidaterrrlookup*s.     zTreeBuilderRegistry.lookupN)__name__ __module__ __qualname__rrrrrrrr sc@steZdZdZdZgZgZdZdZe Z dZ iZ ddZ ddZd d Zd d Zdd dZddZddZddZdS)r z2Turn a document into a Beautiful Soup object tree.z[Unknown tree builder]FNcCs d|_dS)N)soup)rrrrrfszTreeBuilder.__init__cCsdS)Nr)rrrrresetiszTreeBuilder.resetcCs|jdkrdS||jkS)aMight a tag with this name be an empty-element tag? The final markup may or may not actually present this tag as self-closing. For instance: an HTMLBuilder does not consider a

tag to be an empty-element tag (it's not in HTMLBuilder.empty_element_tags). This means an empty

tag will be presented as "

", not "

". The default implementation has no opinion about which tags are empty-element tags, so a tag will be presented as an empty-element tag if and only if it has no contents. "" will become "", and "bar" will be left alone. NT)empty_element_tags)rtag_namerrrcan_be_empty_elementls z TreeBuilder.can_be_empty_elementcCs tdS)N)NotImplementedError)rmarkuprrrfeedszTreeBuilder.feedcCs |dddfS)NFr)rr(Zuser_specified_encodingZdocument_declared_encodingrrrprepare_markupszTreeBuilder.prepare_markupcCs|S)aWrap an HTML fragment to make it look like a document. Different parsers do this differently. For instance, lxml introduces an empty tag, and html5lib doesn't. Abstracting this away lets us write simple tests which run HTML fragments through the parser and compare the results against other HTML fragments. This method should not be used outside of tests. r)rZfragmentrrrtest_fragment_to_documents z%TreeBuilder.test_fragment_to_documentcCsdS)NFr)rtagrrrset_up_substitutionssz TreeBuilder.set_up_substitutionscCs|s|S|jr|jjdg}|jj|jd}xRt|jD]B}||ksT|r<||kr<||}t|trrtj|}n|}|||<qsz.SAXTreeBuilder.startElement..)dictr itemsr"Zhandle_starttag)rnamer5rrr startElementszSAXTreeBuilder.startElementcCs|jj|dS)N)r"Z handle_endtag)rrDrrr endElementszSAXTreeBuilder.endElementcCs|j||dS)N)rE)rnsTuplenodeNamer5rrrstartElementNSszSAXTreeBuilder.startElementNScCs|j|dS)N)rF)rrGrHrrr endElementNSszSAXTreeBuilder.endElementNScCsdS)Nr)rprefixZ nodeValuerrrstartPrefixMappingsz!SAXTreeBuilder.startPrefixMappingcCsdS)Nr)rrKrrrendPrefixMappingszSAXTreeBuilder.endPrefixMappingcCs|jj|dS)N)r"Z handle_data)rcontentrrr charactersszSAXTreeBuilder.characterscCsdS)Nr)rrrr startDocumentszSAXTreeBuilder.startDocumentcCsdS)Nr)rrrr endDocumentszSAXTreeBuilder.endDocumentN)rr r!r:r)r=rErFrIrJrLrMrOrPrQrrrrrsc$@seZdZdZejZedddddddd d d d d ddddddddddddgZeddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.dd/d0d1d2d3d4d5d6d7d8d9d:d;g#Zdgd?d@gd?d@gdAgdAgdAgdBgdCgd?gdDgdEgdFgdG Z dHdIZ dJS)Krz]This TreeBuilder knows facts about HTML. Such as which tags are empty-element tags. areabasebrcolZembedZhrZimginputZkeygenlinkZmenuitemmetaZparamsourceZtrackZwbrZbasefontZbgsoundZcommandframeZimageZisindexZnextidZspacerZaddressZarticleZasideZ blockquoteZcanvasZddZdivZdlZdtZfieldsetZ figcaptionZfigureZfooterformZh1Zh2Zh3Zh4Zh5Zh6headerZlimainZnavZnoscriptZoloutputpZpreZsectiontableZtfootZulZvideoclassZ accesskeyZdropzoneZrelZrevZheaderszaccept-charsetarchiveZsizesZsandboxfor) r.arWtdZthrer[objectrRZiconZiframer^cCsz|jdkrdS|jd}|jd}|jd}d}|dk rJ|}t||d<n(|dk rr|dk rr|jdkrrt||d<|dk S)NrXFz http-equivrNcharsetz content-type)rDrrr0r)rr,Z http_equivrNrgZ meta_encodingrrrr-s      z$HTMLTreeBuilder.set_up_substitutionsN) rr r!r:rr<rr$Zblock_elementsr/r-rrrrrs& N  cCsVtjd}xF|jD]<}t||}t|trt||||jj||jj |qWdS)z9Copy TreeBuilders from the given module into this module.z bs4.builderN) sysmodules__all__getattr issubclassr setattrappendbuilder_registryr)moduleZ this_modulerDobjrrrregister_treebuilders_from2s      rrc@s eZdZdS)ParserRejectedMarkupN)rr r!rrrrrs?srsr>) _htmlparser) _html5lib)_lxml) collectionsr itertoolsrhZ bs4.elementrrrrrjZFASTZ PERMISSIVEZSTRICTZXMLZHTMLZHTML_5rfr ror rrrr Exceptionrsrtru ImportErrorrvrrrrs@ 4b.N      PK!,,2builder/__pycache__/_html5lib.cpython-36.opt-1.pycnu[3 6]0A@sdgZddlZddlZddlmZmZmZmZddlm Z m Z ddl Z ddl m Z mZddlmZmZmZmZyddlmZdZWn2ek rZzdd lmZd ZWYddZ[XnXGd ddeZGd d d ejZGdddeZGdddejZ Gddde Z!dS)HTML5TreeBuilderN) PERMISSIVEHTMLHTML_5HTMLTreeBuilder)NamespacedAttribute whitespace_re) namespacesprefixes)CommentDoctypeNavigableStringTag)_baseF)baseTc@sBeZdZdZdZeeeegZd ddZ ddZ dd Z d d Z dS) rzUse html5lib to build a tree.html5libNccs&||_|rtjd|dddfVdS)NzjYou provided a value for exclude_encoding, but the html5lib tree builder doesn't support exclude_encoding.F)user_specified_encodingwarningswarn)selfmarkuprZdocument_declared_encodingZexclude_encodingsr/usr/lib/python3.6/_html5lib.pyprepare_markup0s zHTML5TreeBuilder.prepare_markupcCs|jjdk rtjdtj|jd}t}t|t sNt rD|j |d<n |j |d<|j |f|}t|t rnd|_ n$|jjjd}t|t s|j}||_ dS)NzYou provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.)ZtreeZoverride_encodingencodingr)soupZ parse_onlyrrrZ HTMLParsercreate_treebuilderdict isinstancestr new_html5librparseoriginal_encodingZ tokenizerstreamZ charEncodingname)rrparserZ extra_kwargsdocr"rrrfeed=s       zHTML5TreeBuilder.feedcCst||j|_|jS)N)TreeBuilderForHtml5librZunderlying_builder)rnamespaceHTMLElementsrrrrXs z#HTML5TreeBuilder.create_treebuildercCsd|S)zSee `TreeBuilder`.z)%sr)rZfragmentrrrtest_fragment_to_document]sz*HTML5TreeBuilder.test_fragment_to_document)NN) __name__ __module__ __qualname____doc__NAMErrrZfeaturesrr'rr*rrrrr)s  csfeZdZdfdd ZddZddZdd Zd d Zd d ZddZ ddZ ddZ ddZ Z S)r(Ncs8|r ||_nddlm}|dd|_tt|j|dS)Nr) BeautifulSoupz html.parser)rbs4r0superr(__init__)rr)rr0) __class__rrr4ds   zTreeBuilderForHtml5lib.__init__cCs|jjt|j|jdS)N)rresetElement)rrrr documentClassls z$TreeBuilderForHtml5lib.documentClasscCs6|d}|d}|d}tj|||}|jj|dS)Nr$publicIdsystemId)r Zfor_name_and_idsrobject_was_parsed)rtokenr$r9r:Zdoctyperrr insertDoctypeps z$TreeBuilderForHtml5lib.insertDoctypecCs|jj||}t||j|S)N)rnew_tagr7)rr$ namespacetagrrr elementClassxsz#TreeBuilderForHtml5lib.elementClasscCstt||jS)N)TextNoder r)rdatarrr commentClass|sz#TreeBuilderForHtml5lib.commentClasscCs0ddlm}|dd|_d|j_t|j|jdS)Nr)r0r1z html.parserz[document_fragment])r2r0rr$r7)rr0rrr fragmentClasss  z$TreeBuilderForHtml5lib.fragmentClasscCs|jj|jdS)N)rappendelement)rnoderrr appendChildsz"TreeBuilderForHtml5lib.appendChildcCs|jS)N)r)rrrr getDocumentsz"TreeBuilderForHtml5lib.getDocumentcCstjj|jS)N)treebuilder_base TreeBuilder getFragmentrG)rrrrrMsz"TreeBuilderForHtml5lib.getFragmentcsBddlmgtjddfdd |ddjS)Nr)r0z8^(.*?)(?: PUBLIC "(.*?)"(?: "(.*?)")?| SYSTEM "(.*?)")?$c st|r t|trj|}|r|jd}|jdkrx|jdpBd}|jdpZ|jdpZd}jdd||||fqjdd||fnjd d|fnHt|tr̈jd d||fn$t|trjd d||fn|jrd t |j|j f}n|j }jd d||f|j rg}x`t |j j D]N\}}t|trnd t |j|j f}t|t rdj|}|j||fqFWx2t|D]&\}}jdd|d||fqW|d7}x|jD]}||qWdS)Nr1z|%s z|%sz|%sz|%sz|%s"%s"z%s %sz|%s<%s>z |%s%s="%s")rr matchgroup lastindexrFr r r?r r$attrslistitemsrjoinsortedZchildren) rGindentmr$r9r: attributesvaluechild)r0 doctype_rervserializeElementrrrbsD            " z?TreeBuilderForHtml5lib.testSerializer..serializeElement )r)r2r0recompilerY)rrGr)r0r`rarbrtestSerializers   ) z%TreeBuilderForHtml5lib.testSerializer)N)r+r,r-r4r8r=rArDrErIrJrMrf __classcell__rr)r5rr(bsr(c@sLeZdZddZddZddZddZd d Zd d Zd dZ ddZ dS)AttrListcCs||_t|jj|_dS)N)rGrrV)rrGrrrr4szAttrList.__init__cCst|jjjS)N)rWrVrX__iter__)rrrrriszAttrList.__iter__cCsPtj}||dks.|jj|krB|||jjkrBt|tsBtj|}||j|<dS)N*)rZcdata_list_attributesrGr$rrWrsplit)rr$r^Z list_attrrrr __setitem__s    zAttrList.__setitem__cCst|jjS)N)rWrVrX)rrrrrXszAttrList.itemscCst|jjS)N)rWrVkeys)rrrrrmsz AttrList.keyscCs t|jS)N)lenrV)rrrr__len__szAttrList.__len__cCs |j|S)N)rV)rr$rrr __getitem__szAttrList.__getitem__cCs|t|jjkS)N)rWrVrm)rr$rrr __contains__szAttrList.__contains__N) r+r,r-r4rirlrXrmrorprqrrrrrhs rhc@sxeZdZddZddZddZddZeeeZdd d Z d d Z ddZ ddZ ddZ ddZddZeeZd S)r7cCs&tjj||j||_||_||_dS)N)rKNoder4r$rGrr?)rrGrr?rrrr4szElement.__init__cCs(d}}t|tr|}}n:t|tr,|}n*|jjtkrJ|j}}||_n |j}||_t|t rv|jdk rv|jj|r|jjr|jjdjtkr|jjd}|j j ||}|j |||j _ n`t|tr|j j |}|jjr|jj d}n |jjdk r |j j }n|j}|j j||j|ddS)NrNF)parentmost_recent_elementru)rrrrGr5r rsextractcontentsr new_string replace_withZ_most_recent_element_last_descendant next_elementr;)rrHZ string_childr_Z old_elementZ new_elementrtrrrrIs8             zElement.appendChildcCst|jtriSt|jS)N)rrGr rh)rrrr getAttributess zElement.getAttributescCs|dk rt|dkrg}x8t|jD](\}}t|tr&t|}||=|||<q&W|jjj|j |x"t|jD]\}}||j |<qrW|jjj |j dS)Nr) rnrWrXrtuplerrZbuilderZ$_replace_cdata_list_attribute_valuesr$rGZset_up_substitutions)rr]Zconverted_attributesr$r^new_namerrr setAttributes!s   zElement.setAttributesNcCs4t|jj||j}|r&|j||n |j|dS)N)rBrrx insertBeforerI)rrCrtextrrr insertText9szElement.insertTextcCs|jj|j}|jjtkrf|jjrf|jj|djtkrf|jj|d}|jj||j}|j|n|jj||j||_ dS)NrN) rGindexr5r rwrrxryinsertrs)rrHZrefNoderZold_nodeZnew_strrrrr@s zElement.insertBeforecCs|jjdS)N)rGrv)rrHrrr removeChildLszElement.removeChildc Cs|j}|j}|j}|jdd}t|jdkr>|jd}|j}n d}|j}|j}t|dkr|d} |rn|| _n|| _|| _|r| |_n| |_|r| |_|djdd} || _|r| |_d| _x|D]} || _|jj | qWg|_||_dS)z1Move all of this tag's children into another tag.FrrNNTruru) rGZ next_siblingrzrnrwr{Zprevious_elementZprevious_siblingrsrF) rZ new_parentrGZnew_parent_elementZfinal_next_elementZnew_parents_last_descendantZnew_parents_last_childZ(new_parents_last_descendant_next_elementZ to_appendZ first_childZlast_childs_last_descendantr_rrrreparentChildrenOs>    zElement.reparentChildrencCsF|jj|jj|j}t||j|j}x|jD]\}}||j|<q,W|S)N)rr>rGr$r?r7r])rr@rHkeyr^rrr cloneNodes zElement.cloneNodecCs|jjS)N)rGrw)rrrr hasContentszElement.hasContentcCs(|jdkrtd|jfS|j|jfSdS)NZhtml)r?r r$)rrrr getNameTuples zElement.getNameTuple)N)r+r,r-r4rIr|rpropertyr]rrrrrrrZ nameTuplerrrrr7s6   Br7c@seZdZddZddZdS)rBcCstjj|d||_||_dS)N)rKrrr4rGr)rrGrrrrr4szTextNode.__init__cCstdS)N)NotImplementedError)rrrrrszTextNode.cloneNodeN)r+r,r-r4rrrrrrBsrB)"__all__rrdZ bs4.builderrrrrZ bs4.elementrrrZhtml5lib.constantsr r r r r rZhtml5lib.treebuildersrrKr ImportErrorerrrLr(objectrhrrr7rBrrrrs&  9_EPK!,,,builder/__pycache__/_html5lib.cpython-36.pycnu[3 6]0A@sdgZddlZddlZddlmZmZmZmZddlm Z m Z ddl Z ddl m Z mZddlmZmZmZmZyddlmZdZWn2ek rZzdd lmZd ZWYddZ[XnXGd ddeZGd d d ejZGdddeZGdddejZ Gddde Z!dS)HTML5TreeBuilderN) PERMISSIVEHTMLHTML_5HTMLTreeBuilder)NamespacedAttribute whitespace_re) namespacesprefixes)CommentDoctypeNavigableStringTag)_baseF)baseTc@sBeZdZdZdZeeeegZd ddZ ddZ dd Z d d Z dS) rzUse html5lib to build a tree.html5libNccs&||_|rtjd|dddfVdS)NzjYou provided a value for exclude_encoding, but the html5lib tree builder doesn't support exclude_encoding.F)user_specified_encodingwarningswarn)selfmarkuprZdocument_declared_encodingZexclude_encodingsr/usr/lib/python3.6/_html5lib.pyprepare_markup0s zHTML5TreeBuilder.prepare_markupcCs|jjdk rtjdtj|jd}t}t|t sNt rD|j |d<n |j |d<|j |f|}t|t rnd|_ n$|jjjd}t|t s|j}||_ dS)NzYou provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.)ZtreeZoverride_encodingencodingr)soupZ parse_onlyrrrZ HTMLParsercreate_treebuilderdict isinstancestr new_html5librparseoriginal_encodingZ tokenizerstreamZ charEncodingname)rrparserZ extra_kwargsdocr"rrrfeed=s       zHTML5TreeBuilder.feedcCst||j|_|jS)N)TreeBuilderForHtml5librZunderlying_builder)rnamespaceHTMLElementsrrrrXs z#HTML5TreeBuilder.create_treebuildercCsd|S)zSee `TreeBuilder`.z)%sr)rZfragmentrrrtest_fragment_to_document]sz*HTML5TreeBuilder.test_fragment_to_document)NN) __name__ __module__ __qualname____doc__NAMErrrZfeaturesrr'rr*rrrrr)s  csfeZdZdfdd ZddZddZdd Zd d Zd d ZddZ ddZ ddZ ddZ Z S)r(Ncs8|r ||_nddlm}|dd|_tt|j|dS)Nr) BeautifulSoupz html.parser)rbs4r0superr(__init__)rr)rr0) __class__rrr4ds   zTreeBuilderForHtml5lib.__init__cCs|jjt|j|jdS)N)rresetElement)rrrr documentClassls z$TreeBuilderForHtml5lib.documentClasscCs6|d}|d}|d}tj|||}|jj|dS)Nr$publicIdsystemId)r Zfor_name_and_idsrobject_was_parsed)rtokenr$r9r:Zdoctyperrr insertDoctypeps z$TreeBuilderForHtml5lib.insertDoctypecCs|jj||}t||j|S)N)rnew_tagr7)rr$ namespacetagrrr elementClassxsz#TreeBuilderForHtml5lib.elementClasscCstt||jS)N)TextNoder r)rdatarrr commentClass|sz#TreeBuilderForHtml5lib.commentClasscCs0ddlm}|dd|_d|j_t|j|jdS)Nr)r0r1z html.parserz[document_fragment])r2r0rr$r7)rr0rrr fragmentClasss  z$TreeBuilderForHtml5lib.fragmentClasscCs|jj|jdS)N)rappendelement)rnoderrr appendChildsz"TreeBuilderForHtml5lib.appendChildcCs|jS)N)r)rrrr getDocumentsz"TreeBuilderForHtml5lib.getDocumentcCstjj|jS)N)treebuilder_base TreeBuilder getFragmentrG)rrrrrMsz"TreeBuilderForHtml5lib.getFragmentcsBddlmgtjddfdd |ddjS)Nr)r0z8^(.*?)(?: PUBLIC "(.*?)"(?: "(.*?)")?| SYSTEM "(.*?)")?$c st|r t|trj|}|r|jd}|jdkrx|jdpBd}|jdpZ|jdpZd}jdd||||fqjdd||fnjd d|fnHt|tr̈jd d||fn$t|trjd d||fn|jrd t |j|j f}n|j }jd d||f|j rg}x`t |j j D]N\}}t|trnd t |j|j f}t|t rdj|}|j||fqFWx2t|D]&\}}jdd|d||fqW|d7}x|jD]}||qWdS)Nr1z|%s z|%sz|%sz|%sz|%s"%s"z%s %sz|%s<%s>z |%s%s="%s")rr matchgroup lastindexrFr r r?r r$attrslistitemsrjoinsortedZchildren) rGindentmr$r9r: attributesvaluechild)r0 doctype_rervserializeElementrrrbsD            " z?TreeBuilderForHtml5lib.testSerializer..serializeElement )r)r2r0recompilerY)rrGr)r0r`rarbrtestSerializers   ) z%TreeBuilderForHtml5lib.testSerializer)N)r+r,r-r4r8r=rArDrErIrJrMrf __classcell__rr)r5rr(bsr(c@sLeZdZddZddZddZddZd d Zd d Zd dZ ddZ dS)AttrListcCs||_t|jj|_dS)N)rGrrV)rrGrrrr4szAttrList.__init__cCst|jjjS)N)rWrVrX__iter__)rrrrriszAttrList.__iter__cCsPtj}||dks.|jj|krB|||jjkrBt|tsBtj|}||j|<dS)N*)rZcdata_list_attributesrGr$rrWrsplit)rr$r^Z list_attrrrr __setitem__s    zAttrList.__setitem__cCst|jjS)N)rWrVrX)rrrrrXszAttrList.itemscCst|jjS)N)rWrVkeys)rrrrrmsz AttrList.keyscCs t|jS)N)lenrV)rrrr__len__szAttrList.__len__cCs |j|S)N)rV)rr$rrr __getitem__szAttrList.__getitem__cCs|t|jjkS)N)rWrVrm)rr$rrr __contains__szAttrList.__contains__N) r+r,r-r4rirlrXrmrorprqrrrrrhs rhc@sxeZdZddZddZddZddZeeeZdd d Z d d Z ddZ ddZ ddZ ddZddZeeZd S)r7cCs&tjj||j||_||_||_dS)N)rKNoder4r$rGrr?)rrGrr?rrrr4szElement.__init__cCs(d}}t|tr|}}n:t|tr,|}n*|jjtkrJ|j}}||_n |j}||_t|t rv|jdk rv|jj|r|jjr|jjdjtkr|jjd}|j j ||}|j |||j _ n`t|tr|j j |}|jjr|jj d}n |jjdk r |j j }n|j}|j j||j|ddS)NrNF)parentmost_recent_elementru)rrrrGr5r rsextractcontentsr new_string replace_withZ_most_recent_element_last_descendant next_elementr;)rrHZ string_childr_Z old_elementZ new_elementrtrrrrIs8             zElement.appendChildcCst|jtriSt|jS)N)rrGr rh)rrrr getAttributess zElement.getAttributescCs|dk rt|dkrg}x8t|jD](\}}t|tr&t|}||=|||<q&W|jjj|j |x"t|jD]\}}||j |<qrW|jjj |j dS)Nr) rnrWrXrtuplerrZbuilderZ$_replace_cdata_list_attribute_valuesr$rGZset_up_substitutions)rr]Zconverted_attributesr$r^new_namerrr setAttributes!s   zElement.setAttributesNcCs4t|jj||j}|r&|j||n |j|dS)N)rBrrx insertBeforerI)rrCrtextrrr insertText9szElement.insertTextcCs|jj|j}|jjtkrf|jjrf|jj|djtkrf|jj|d}|jj||j}|j|n|jj||j||_ dS)NrN) rGindexr5r rwrrxryinsertrs)rrHZrefNoderZold_nodeZnew_strrrrr@s zElement.insertBeforecCs|jjdS)N)rGrv)rrHrrr removeChildLszElement.removeChildc Cs|j}|j}|j}|jdd}t|jdkr>|jd}|j}n d}|j}|j}t|dkr|d} |rn|| _n|| _|| _|r| |_n| |_|r| |_|djdd} || _|r| |_d| _x|D]} || _|jj | qWg|_||_dS)z1Move all of this tag's children into another tag.FrrNNTruru) rGZ next_siblingrzrnrwr{Zprevious_elementZprevious_siblingrsrF) rZ new_parentrGZnew_parent_elementZfinal_next_elementZnew_parents_last_descendantZnew_parents_last_childZ(new_parents_last_descendant_next_elementZ to_appendZ first_childZlast_childs_last_descendantr_rrrreparentChildrenOs>    zElement.reparentChildrencCsF|jj|jj|j}t||j|j}x|jD]\}}||j|<q,W|S)N)rr>rGr$r?r7r])rr@rHkeyr^rrr cloneNodes zElement.cloneNodecCs|jjS)N)rGrw)rrrr hasContentszElement.hasContentcCs(|jdkrtd|jfS|j|jfSdS)NZhtml)r?r r$)rrrr getNameTuples zElement.getNameTuple)N)r+r,r-r4rIr|rpropertyr]rrrrrrrZ nameTuplerrrrr7s6   Br7c@seZdZddZddZdS)rBcCstjj|d||_||_dS)N)rKrrr4rGr)rrGrrrrr4szTextNode.__init__cCstdS)N)NotImplementedError)rrrrrszTextNode.cloneNodeN)r+r,r-r4rrrrrrBsrB)"__all__rrdZ bs4.builderrrrrZ bs4.elementrrrZhtml5lib.constantsr r r r r rZhtml5lib.treebuildersrrKr ImportErrorerrrLr(objectrhrrr7rBrrrrs&  9_EPK!ޞ@!!4builder/__pycache__/_htmlparser.cpython-36.opt-1.pycnu[3 6]3@sdZdgZddlmZyddlmZWn2ek rXZzGdddeZWYddZ[XnXddlZddl Z ej dd\Z Z Z e dkoe d koe dkZe dkoe dkZe dkoe d kZdd lmZmZmZmZmZdd lmZmZdd lmZmZmZdZGdddeZGdddeZ e dkre d kre rddl!Z!e!j"dZ#e#e _#e!j"de!j$Z%e%e_%ddlm&Z&m'Z'ddZ(ddZ)e(e_(e)e_)dZdS)zCUse the HTMLParser library to parse HTML files that aren't too bad.HTMLParserTreeBuilder) HTMLParser)HTMLParseErrorc@s eZdZdS)rN)__name__ __module__ __qualname__rr!/usr/lib/python3.6/_htmlparser.pyrsrN)CDataComment DeclarationDoctypeProcessingInstruction)EntitySubstitution UnicodeDammit)HTMLHTMLTreeBuilderSTRICTz html.parserc@speZdZddZddZddZddd Zdd d Zd d ZddZ ddZ ddZ ddZ ddZ ddZdS)BeautifulSoupHTMLParsercOstj|f||g|_dS)N)r__init__already_closed_empty_element)selfargskwargsrrr r9s z BeautifulSoupHTMLParser.__init__cCstj|dS)aiIn Python 3, HTMLParser subclasses must implement error(), although this requirement doesn't appear to be documented. In Python 2, HTMLParser implements error() as raising an exception. In any event, this method is called only on very strange markup and our best strategy is to pretend it didn't happen and keep going. N)warningswarn)rmsgrrr errorEs zBeautifulSoupHTMLParser.errorcCs|j||dd}|j|dS)NF)handle_empty_element)handle_starttag handle_endtag)rnameattrstagrrr handle_startendtagPsz*BeautifulSoupHTMLParser.handle_startendtagTc Csli}x(|D] \}}|dkrd}|||<d}q W|jj|dd|}|rh|jrh|rh|j|dd|jj|dS)Nz""F)check_already_closed)soupr"Zis_empty_elementr#rappend) rr$r%r!Z attr_dictkeyvalue attrvaluer&rrr r"[s z'BeautifulSoupHTMLParser.handle_starttagcCs,|r||jkr|jj|n |jj|dS)N)rremover*r#)rr$r)rrr r#wsz%BeautifulSoupHTMLParser.handle_endtagcCs|jj|dS)N)r* handle_data)rdatarrr r0sz#BeautifulSoupHTMLParser.handle_datacCs|jdrt|jdd}n$|jdr8t|jdd}nt|}d}|dkrxR|jjdfD]B}|sdqZyt|gj|}WqZtk r}zWYdd}~XqZXqZW|sy t|}Wn&t t fk r}zWYdd}~XnX|pd}|j |dS)NxXz windows-1252u�) startswithintlstripr*original_encoding bytearraydecodeUnicodeDecodeErrorchr ValueError OverflowErrorr0)rr$Z real_namer1encodingerrr handle_charrefs*   z&BeautifulSoupHTMLParser.handle_charrefcCs0tjj|}|dk r|}nd|}|j|dS)Nz&%s)rZHTML_ENTITY_TO_CHARACTERgetr0)rr$ characterr1rrr handle_entityrefs  z(BeautifulSoupHTMLParser.handle_entityrefcCs&|jj|jj||jjtdS)N)r*endDatar0r)rr1rrr handle_comments  z&BeautifulSoupHTMLParser.handle_commentcCsN|jj|jdr&|tdd}n |dkr2d}|jj||jjtdS)NzDOCTYPE ZDOCTYPEr()r*rFr6lenr0r)rr1rrr handle_decls   z#BeautifulSoupHTMLParser.handle_declcCsN|jjdr$t}|tdd}nt}|jj|jj||jj|dS)NzCDATA[)upperr6r rHrr*rFr0)rr1clsrrr unknown_decls  z$BeautifulSoupHTMLParser.unknown_declcCs&|jj|jj||jjtdS)N)r*rFr0r)rr1rrr handle_pis  z!BeautifulSoupHTMLParser.handle_piN)T)T)rrrrr r'r"r#r0rBrErGrIrLrMrrrr r7s    !   rc@s<eZdZdZdZeZeeegZ ddZ d ddZ dd Z dS) rFTcOs,trt rd|d<trd|d<||f|_dS)NFstrictZconvert_charrefs)CONSTRUCTOR_TAKES_STRICT CONSTRUCTOR_STRICT_IS_DEPRECATED"CONSTRUCTOR_TAKES_CONVERT_CHARREFS parser_args)rrrrrr rs  zHTMLParserTreeBuilder.__init__NccsNt|tr|dddfVdS||g}t||d|d}|j|j|j|jfVdS)z :return: A 4-tuple (markup, original encoding, encoding declared within markup, whether any characters had to be replaced with REPLACEMENT CHARACTER). NFT)Zis_htmlexclude_encodings) isinstancestrrmarkupr9Zdeclared_html_encodingZcontains_replacement_characters)rrVZuser_specified_encodingZdocument_declared_encodingrSZ try_encodingsZdammitrrr prepare_markups z$HTMLParserTreeBuilder.prepare_markupcCsr|j\}}t||}|j|_y|j||jWn4tk rf}ztjtd|WYdd}~XnXg|_ dS)Na*Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.) rRrr*feedcloserrrRuntimeWarningr)rrVrrparserrArrr rXs    zHTMLParserTreeBuilder.feed)NNN) rrrZis_xmlZ picklable HTMLPARSERNAMErrZfeaturesrrWrXrrrr rs  zQ\s*((?<=[\'"\s])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?a <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace )tagfindattrfindcCs,d|_|j|}|dkr|S|j}||||_g}tj||d}|j}||d|j|_}x||krD|jrt j||}n t j||}|sP|j ddd\} } } | sd} nb| dddko| d dkns | dddko| d dknr| dd} | r(|j | } |j | j| f|j}qhW|||j} | dkr|j\} }d |jkr| |jjd } t|j|jjd }n|t|j}|jr|jd |||dd f|j||||S| jdr|j||n"|j||||jkr(|j||S)Nrr r '">/> z junk characters in start tag: %rrgrg)rcrd)Z__starttag_textZcheck_for_whole_start_tagrawdatar^matchendlowerZlasttagrNr_attrfind_tolerantgroupZunescaper+stripZgetposcountrHrfindr r0endswithr'r"ZCDATA_CONTENT_ELEMENTSset_cdata_mode)riendposrhr%rikr&mZattrnamerestr.rjlinenooffsetrrr parse_starttagsZ   *,          rzcCs$|j|_tjd|jtj|_dS)Nz )rkZ cdata_elemrecompileIZ interesting)relemrrr rrTs rrT)*__doc____all__Z html.parserrr ImportErrorrA Exceptionsysr version_infomajorminorreleaserOrPrQZ bs4.elementr rrrrZ bs4.dammitrrZ bs4.builderrrrr\rrr{r|rlVERBOSEZlocatestarttagendr^r_rzrrrrrr sB " 2 7PK!5,",".builder/__pycache__/_htmlparser.cpython-36.pycnu[3 6]3@sdZdgZddlmZyddlmZWn2ek rXZzGdddeZWYddZ[XnXddlZddl Z ej dd\Z Z Z e dkoe d koe dkZe dkoe dkZe dkoe d kZdd lmZmZmZmZmZdd lmZmZdd lmZmZmZdZGdddeZGdddeZ e dkre d kre rddl!Z!e!j"dZ#e#e _#e!j"de!j$Z%e%e_%ddlm&Z&m'Z'ddZ(ddZ)e(e_(e)e_)dZdS)zCUse the HTMLParser library to parse HTML files that aren't too bad.HTMLParserTreeBuilder) HTMLParser)HTMLParseErrorc@s eZdZdS)rN)__name__ __module__ __qualname__rr!/usr/lib/python3.6/_htmlparser.pyrsrN)CDataComment DeclarationDoctypeProcessingInstruction)EntitySubstitution UnicodeDammit)HTMLHTMLTreeBuilderSTRICTz html.parserc@speZdZddZddZddZddd Zdd d Zd d ZddZ ddZ ddZ ddZ ddZ ddZdS)BeautifulSoupHTMLParsercOstj|f||g|_dS)N)r__init__already_closed_empty_element)selfargskwargsrrr r9s z BeautifulSoupHTMLParser.__init__cCstj|dS)aiIn Python 3, HTMLParser subclasses must implement error(), although this requirement doesn't appear to be documented. In Python 2, HTMLParser implements error() as raising an exception. In any event, this method is called only on very strange markup and our best strategy is to pretend it didn't happen and keep going. N)warningswarn)rmsgrrr errorEs zBeautifulSoupHTMLParser.errorcCs|j||dd}|j|dS)NF)handle_empty_element)handle_starttag handle_endtag)rnameattrstagrrr handle_startendtagPsz*BeautifulSoupHTMLParser.handle_startendtagTc Csli}x(|D] \}}|dkrd}|||<d}q W|jj|dd|}|rh|jrh|rh|j|dd|jj|dS)Nz""F)check_already_closed)soupr"Zis_empty_elementr#rappend) rr$r%r!Z attr_dictkeyvalue attrvaluer&rrr r"[s z'BeautifulSoupHTMLParser.handle_starttagcCs,|r||jkr|jj|n |jj|dS)N)rremover*r#)rr$r)rrr r#wsz%BeautifulSoupHTMLParser.handle_endtagcCs|jj|dS)N)r* handle_data)rdatarrr r0sz#BeautifulSoupHTMLParser.handle_datacCs|jdrt|jdd}n$|jdr8t|jdd}nt|}d}|dkrxR|jjdfD]B}|sdqZyt|gj|}WqZtk r}zWYdd}~XqZXqZW|sy t|}Wn&t t fk r}zWYdd}~XnX|pd}|j |dS)NxXz windows-1252u�) startswithintlstripr*original_encoding bytearraydecodeUnicodeDecodeErrorchr ValueError OverflowErrorr0)rr$Z real_namer1encodingerrr handle_charrefs*   z&BeautifulSoupHTMLParser.handle_charrefcCs0tjj|}|dk r|}nd|}|j|dS)Nz&%s)rZHTML_ENTITY_TO_CHARACTERgetr0)rr$ characterr1rrr handle_entityrefs  z(BeautifulSoupHTMLParser.handle_entityrefcCs&|jj|jj||jjtdS)N)r*endDatar0r)rr1rrr handle_comments  z&BeautifulSoupHTMLParser.handle_commentcCsN|jj|jdr&|tdd}n |dkr2d}|jj||jjtdS)NzDOCTYPE ZDOCTYPEr()r*rFr6lenr0r)rr1rrr handle_decls   z#BeautifulSoupHTMLParser.handle_declcCsN|jjdr$t}|tdd}nt}|jj|jj||jj|dS)NzCDATA[)upperr6r rHrr*rFr0)rr1clsrrr unknown_decls  z$BeautifulSoupHTMLParser.unknown_declcCs&|jj|jj||jjtdS)N)r*rFr0r)rr1rrr handle_pis  z!BeautifulSoupHTMLParser.handle_piN)T)T)rrrrr r'r"r#r0rBrErGrIrLrMrrrr r7s    !   rc@s<eZdZdZdZeZeeegZ ddZ d ddZ dd Z dS) rFTcOs,trt rd|d<trd|d<||f|_dS)NFstrictZconvert_charrefs)CONSTRUCTOR_TAKES_STRICT CONSTRUCTOR_STRICT_IS_DEPRECATED"CONSTRUCTOR_TAKES_CONVERT_CHARREFS parser_args)rrrrrr rs  zHTMLParserTreeBuilder.__init__NccsNt|tr|dddfVdS||g}t||d|d}|j|j|j|jfVdS)z :return: A 4-tuple (markup, original encoding, encoding declared within markup, whether any characters had to be replaced with REPLACEMENT CHARACTER). NFT)Zis_htmlexclude_encodings) isinstancestrrmarkupr9Zdeclared_html_encodingZcontains_replacement_characters)rrVZuser_specified_encodingZdocument_declared_encodingrSZ try_encodingsZdammitrrr prepare_markups z$HTMLParserTreeBuilder.prepare_markupcCsr|j\}}t||}|j|_y|j||jWn4tk rf}ztjtd|WYdd}~XnXg|_ dS)Na*Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.) rRrr*feedcloserrrRuntimeWarningr)rrVrrparserrArrr rXs    zHTMLParserTreeBuilder.feed)NNN) rrrZis_xmlZ picklable HTMLPARSERNAMErrZfeaturesrrWrXrrrr rs  zQ\s*((?<=[\'"\s])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?a <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace )tagfindattrfindcCs8d|_|j|}|dkr|S|j}||||_g}tj||d}|sPtd|j}||d|j|_}x||krP|j rt j||}n t j||}|sP|j ddd\} } } | sd} nb| dddko| d dkns| dddko| ddknr$| dd} | r4|j | } |j| j| f|j}qtW|||j} | dkr|j\} }d |jkr| |jjd } t|j|jjd }n|t|j}|j r|jd |||dd f|j||||S| jd r|j||n"|j||||jkr4|j||S)Nrz#unexpected call to parse_starttag()r r '">/> z junk characters in start tag: %rrgrg)rcrd)Z__starttag_textZcheck_for_whole_start_tagrawdatar^matchAssertionErrorendlowerZlasttagrNr_attrfind_tolerantgroupZunescaper+stripZgetposcountrHrfindr r0endswithr'r"ZCDATA_CONTENT_ELEMENTSset_cdata_mode)riendposrhr%rikr&mZattrnamerestr.rklinenooffsetrrr parse_starttags\    *,          r{cCs$|j|_tjd|jtj|_dS)Nz )rlZ cdata_elemrecompileIZ interesting)relemrrr rsTs rsT)*__doc____all__Z html.parserrr ImportErrorrA Exceptionsysr version_infomajorminorreleaserOrPrQZ bs4.elementr rrrrZ bs4.dammitrrZ bs4.builderrrrr\rrr|r}rmVERBOSEZlocatestarttagendr^r_r{rsrrrr sB " 2 7PK!V &&.builder/__pycache__/_lxml.cpython-36.opt-1.pycnu[3 6]}%@sddgZyddlmZWn.ek rFZzddlmZWYddZ[XnXddlmZddlmZddl m Z ddl m Z m Z mZmZmZdd lmZmZmZmZmZmZmZdd lmZd ZGd ddeZGd ddeeZdS)LXMLTreeBuilderForXMLLXMLTreeBuilder)CallableN)BytesIO)StringIO)etree)CommentDoctypeNamespacedAttributeProcessingInstructionXMLProcessingInstruction)FASTHTMLHTMLTreeBuilder PERMISSIVEParserRejectedMarkup TreeBuilderXML)EncodingDetectorlxmlc@seZdZejZdZeZdZ dgZ e e e e egZdZddiZddZdd Zd%d d Zd dZd&ddZddZddZifddZddZddZddZddZdd Zd!d"Zd#d$Z d S)'rTzlxml-xmlZxmliz$http://www.w3.org/XML/1998/namespacecCs"|jdk r|jStj|dd|dS)NFT)target strip_cdataZrecoverencoding)_default_parserr XMLParser)selfrr/usr/lib/python3.6/_lxml.pydefault_parser5s z$LXMLTreeBuilderForXML.default_parsercCs&|j|}t|tr"||d|d}|S)NF)rrr)r isinstancer)rrparserrrr parser_for=s  z LXMLTreeBuilderForXML.parser_forNcCs,||_|dk rt||_d|_|jg|_dS)N)rsetempty_element_tagssoupDEFAULT_NSMAPSnsmaps)rr r#rrr__init__Fs  zLXMLTreeBuilderForXML.__init__cCs0|ddkr$t|ddjddSd|fSdS)Nr{})tuplesplit)rtagrrr _getNsTagPs zLXMLTreeBuilderForXML._getNsTagc cs|j }|rt|_nt|_t|tr2|d|dfVt|trP|jdd|dfV||g}t||||}x|jD]}|j ||dfVqnWdS)z :yield: A series of 4-tuples. (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for parsing the document. NFutf8) is_xmlr processing_instruction_classr rstrencoderZ encodingsmarkup) rr4Zuser_specified_encodingZexclude_encodingsZdocument_declared_encodingZis_htmlZ try_encodingsZdetectorrrrrprepare_markupXs      z$LXMLTreeBuilderForXML.prepare_markupcCst|trt|}nt|tr&t|}|j|j}y`|j|jj |_ |j j |x4t |dkr|j|j}t |dkrR|j j |qRW|j j Wn6tttjfk r}ztt|WYdd}~XnXdS)Nr)rbytesrr2rread CHUNK_SIZEr!r$original_encodingr feedlencloseUnicodeDecodeError LookupErrorr ParserErrorr)rr4dataerrrr:}s       zLXMLTreeBuilderForXML.feedcCs|jg|_dS)N)r%r&)rrrrr<szLXMLTreeBuilderForXML.closec Cs*t|}d}t|dkr4t|jdkr4|jjdnht|dkrtddt|jD}|jj||j}x,t|jD]\}}td|d}|||<q|Wi} xVt|jD]F\} } |j| \}} |dkr| | | <q|j |}t|| |} | | | <qW| }|j|\}}|j |}|j j ||||dS)Nrr)css|]\}}||fVqdS)Nr).0keyvaluerrr sz.LXMLTreeBuilderForXML.start..Zxmlnszhttp://www.w3.org/2000/xmlns/) dictr;r&appendlistitemscopyr r._prefix_for_namespacer$Zhandle_starttag) rnameZattrsZnsmapnsprefixinverted_nsmapprefix namespaceZ attributeZ new_attrsattrrDrrrstarts0         zLXMLTreeBuilderForXML.startcCs<|dkr dSx*t|jD]}|dk r||kr||SqWdS)z9Find the currently active prefix for the given namespace.N)reversedr&)rrPrNrrrrKs  z+LXMLTreeBuilderForXML._prefix_for_namespacecCs|jj|jjd}|j|\}}d}|dk r^x,t|jD]}|dk r<||kr<||}Pq %sr)rfragmentrrrtest_fragment_to_documentsz/LXMLTreeBuilderForXML.test_fragment_to_document)NN)NNN)!__name__ __module__ __qualname__rrZDEFAULT_PARSER_CLASSr0r r1NAMEALTERNATE_NAMESLXMLrr rfeaturesr8r%rr!r'r.r5r:r<rRrKrWrZr@r]r^r`rrrrr#s2  # ( c@sFeZdZeZdgZeeeeegZ dZ e Z ddZ ddZddZd S) rz lxml-htmlFcCstjS)N)rZ HTMLParser)rrrrrrszLXMLTreeBuilder.default_parsercCsj|jj}y&|j||_|jj||jjWn6tttj fk rd}zt t |WYdd}~XnXdS)N) r$r9r!r r:r<r=r>rr?rr2)rr4rrArrrr:s  zLXMLTreeBuilder.feedcCsd|S)zSee `TreeBuilder`.z%sr)rr_rrrr`sz)LXMLTreeBuilder.test_fragment_to_documentN)rarbrcrfrdrerr rrgr0r r1rr:r`rrrrrs )__all__collections.abcr ImportErrorrA collectionsiorrrrZ bs4.elementrr r r r Z bs4.builderr rrrrrrZ bs4.dammitrrfrrrrrrs   $ LPK!V &&(builder/__pycache__/_lxml.cpython-36.pycnu[3 6]}%@sddgZyddlmZWn.ek rFZzddlmZWYddZ[XnXddlmZddlmZddl m Z ddl m Z m Z mZmZmZdd lmZmZmZmZmZmZmZdd lmZd ZGd ddeZGd ddeeZdS)LXMLTreeBuilderForXMLLXMLTreeBuilder)CallableN)BytesIO)StringIO)etree)CommentDoctypeNamespacedAttributeProcessingInstructionXMLProcessingInstruction)FASTHTMLHTMLTreeBuilder PERMISSIVEParserRejectedMarkup TreeBuilderXML)EncodingDetectorlxmlc@seZdZejZdZeZdZ dgZ e e e e egZdZddiZddZdd Zd%d d Zd dZd&ddZddZddZifddZddZddZddZddZdd Zd!d"Zd#d$Z d S)'rTzlxml-xmlZxmliz$http://www.w3.org/XML/1998/namespacecCs"|jdk r|jStj|dd|dS)NFT)target strip_cdataZrecoverencoding)_default_parserr XMLParser)selfrr/usr/lib/python3.6/_lxml.pydefault_parser5s z$LXMLTreeBuilderForXML.default_parsercCs&|j|}t|tr"||d|d}|S)NF)rrr)r isinstancer)rrparserrrr parser_for=s  z LXMLTreeBuilderForXML.parser_forNcCs,||_|dk rt||_d|_|jg|_dS)N)rsetempty_element_tagssoupDEFAULT_NSMAPSnsmaps)rr r#rrr__init__Fs  zLXMLTreeBuilderForXML.__init__cCs0|ddkr$t|ddjddSd|fSdS)Nr{})tuplesplit)rtagrrr _getNsTagPs zLXMLTreeBuilderForXML._getNsTagc cs|j }|rt|_nt|_t|tr2|d|dfVt|trP|jdd|dfV||g}t||||}x|jD]}|j ||dfVqnWdS)z :yield: A series of 4-tuples. (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for parsing the document. NFutf8) is_xmlr processing_instruction_classr rstrencoderZ encodingsmarkup) rr4Zuser_specified_encodingZexclude_encodingsZdocument_declared_encodingZis_htmlZ try_encodingsZdetectorrrrrprepare_markupXs      z$LXMLTreeBuilderForXML.prepare_markupcCst|trt|}nt|tr&t|}|j|j}y`|j|jj |_ |j j |x4t |dkr|j|j}t |dkrR|j j |qRW|j j Wn6tttjfk r}ztt|WYdd}~XnXdS)Nr)rbytesrr2rread CHUNK_SIZEr!r$original_encodingr feedlencloseUnicodeDecodeError LookupErrorr ParserErrorr)rr4dataerrrr:}s       zLXMLTreeBuilderForXML.feedcCs|jg|_dS)N)r%r&)rrrrr<szLXMLTreeBuilderForXML.closec Cs*t|}d}t|dkr4t|jdkr4|jjdnht|dkrtddt|jD}|jj||j}x,t|jD]\}}td|d}|||<q|Wi} xVt|jD]F\} } |j| \}} |dkr| | | <q|j |}t|| |} | | | <qW| }|j|\}}|j |}|j j ||||dS)Nrr)css|]\}}||fVqdS)Nr).0keyvaluerrr sz.LXMLTreeBuilderForXML.start..Zxmlnszhttp://www.w3.org/2000/xmlns/) dictr;r&appendlistitemscopyr r._prefix_for_namespacer$Zhandle_starttag) rnameZattrsZnsmapnsprefixinverted_nsmapprefix namespaceZ attributeZ new_attrsattrrDrrrstarts0         zLXMLTreeBuilderForXML.startcCs<|dkr dSx*t|jD]}|dk r||kr||SqWdS)z9Find the currently active prefix for the given namespace.N)reversedr&)rrPrNrrrrKs  z+LXMLTreeBuilderForXML._prefix_for_namespacecCs|jj|jjd}|j|\}}d}|dk r^x,t|jD]}|dk r<||kr<||}Pq %sr)rfragmentrrrtest_fragment_to_documentsz/LXMLTreeBuilderForXML.test_fragment_to_document)NN)NNN)!__name__ __module__ __qualname__rrZDEFAULT_PARSER_CLASSr0r r1NAMEALTERNATE_NAMESLXMLrr rfeaturesr8r%rr!r'r.r5r:r<rRrKrWrZr@r]r^r`rrrrr#s2  # ( c@sFeZdZeZdgZeeeeegZ dZ e Z ddZ ddZddZd S) rz lxml-htmlFcCstjS)N)rZ HTMLParser)rrrrrrszLXMLTreeBuilder.default_parsercCsj|jj}y&|j||_|jj||jjWn6tttj fk rd}zt t |WYdd}~XnXdS)N) r$r9r!r r:r<r=r>rr?rr2)rr4rrArrrr:s  zLXMLTreeBuilder.feedcCsd|S)zSee `TreeBuilder`.z%sr)rr_rrrr`sz)LXMLTreeBuilder.test_fragment_to_documentN)rarbrcrfrdrerr rrgr0r r1rr:r`rrrrrs )__all__collections.abcr ImportErrorrA collectionsiorrrrZ bs4.elementrr r r r Z bs4.builderr rrrrrrZ bs4.dammitrrfrrrrrrs   $ LPK!//builder/__init__.pynu[# Use of this source code is governed by a BSD-style license that can be # found in the LICENSE file. from collections import defaultdict import itertools import sys from bs4.element import ( CharsetMetaAttributeValue, ContentMetaAttributeValue, HTMLAwareEntitySubstitution, whitespace_re ) __all__ = [ 'HTMLTreeBuilder', 'SAXTreeBuilder', 'TreeBuilder', 'TreeBuilderRegistry', ] # Some useful features for a TreeBuilder to have. FAST = 'fast' PERMISSIVE = 'permissive' STRICT = 'strict' XML = 'xml' HTML = 'html' HTML_5 = 'html5' class TreeBuilderRegistry(object): def __init__(self): self.builders_for_feature = defaultdict(list) self.builders = [] def register(self, treebuilder_class): """Register a treebuilder based on its advertised features.""" for feature in treebuilder_class.features: self.builders_for_feature[feature].insert(0, treebuilder_class) self.builders.insert(0, treebuilder_class) def lookup(self, *features): if len(self.builders) == 0: # There are no builders at all. return None if len(features) == 0: # They didn't ask for any features. Give them the most # recently registered builder. return self.builders[0] # Go down the list of features in order, and eliminate any builders # that don't match every feature. features = list(features) features.reverse() candidates = None candidate_set = None while len(features) > 0: feature = features.pop() we_have_the_feature = self.builders_for_feature.get(feature, []) if len(we_have_the_feature) > 0: if candidates is None: candidates = we_have_the_feature candidate_set = set(candidates) else: # Eliminate any candidates that don't have this feature. candidate_set = candidate_set.intersection( set(we_have_the_feature)) # The only valid candidates are the ones in candidate_set. # Go through the original list of candidates and pick the first one # that's in candidate_set. if candidate_set is None: return None for candidate in candidates: if candidate in candidate_set: return candidate return None # The BeautifulSoup class will take feature lists from developers and use them # to look up builders in this registry. builder_registry = TreeBuilderRegistry() class TreeBuilder(object): """Turn a document into a Beautiful Soup object tree.""" NAME = "[Unknown tree builder]" ALTERNATE_NAMES = [] features = [] is_xml = False picklable = False preserve_whitespace_tags = set() empty_element_tags = None # A tag will be considered an empty-element # tag when and only when it has no contents. # A value for these tag/attribute combinations is a space- or # comma-separated list of CDATA, rather than a single CDATA. cdata_list_attributes = {} def __init__(self): self.soup = None def reset(self): pass def can_be_empty_element(self, tag_name): """Might a tag with this name be an empty-element tag? The final markup may or may not actually present this tag as self-closing. For instance: an HTMLBuilder does not consider a

tag to be an empty-element tag (it's not in HTMLBuilder.empty_element_tags). This means an empty

tag will be presented as "

", not "

". The default implementation has no opinion about which tags are empty-element tags, so a tag will be presented as an empty-element tag if and only if it has no contents. "" will become "", and "bar" will be left alone. """ if self.empty_element_tags is None: return True return tag_name in self.empty_element_tags def feed(self, markup): raise NotImplementedError() def prepare_markup(self, markup, user_specified_encoding=None, document_declared_encoding=None): return markup, None, None, False def test_fragment_to_document(self, fragment): """Wrap an HTML fragment to make it look like a document. Different parsers do this differently. For instance, lxml introduces an empty tag, and html5lib doesn't. Abstracting this away lets us write simple tests which run HTML fragments through the parser and compare the results against other HTML fragments. This method should not be used outside of tests. """ return fragment def set_up_substitutions(self, tag): return False def _replace_cdata_list_attribute_values(self, tag_name, attrs): """Replaces class="foo bar" with class=["foo", "bar"] Modifies its input in place. """ if not attrs: return attrs if self.cdata_list_attributes: universal = self.cdata_list_attributes.get('*', []) tag_specific = self.cdata_list_attributes.get( tag_name.lower(), None) for attr in list(attrs.keys()): if attr in universal or (tag_specific and attr in tag_specific): # We have a "class"-type attribute whose string # value is a whitespace-separated list of # values. Split it into a list. value = attrs[attr] if isinstance(value, str): values = whitespace_re.split(value) else: # html5lib sometimes calls setAttributes twice # for the same tag when rearranging the parse # tree. On the second call the attribute value # here is already a list. If this happens, # leave the value alone rather than trying to # split it again. values = value attrs[attr] = values return attrs class SAXTreeBuilder(TreeBuilder): """A Beautiful Soup treebuilder that listens for SAX events.""" def feed(self, markup): raise NotImplementedError() def close(self): pass def startElement(self, name, attrs): attrs = dict((key[1], value) for key, value in list(attrs.items())) #print "Start %s, %r" % (name, attrs) self.soup.handle_starttag(name, attrs) def endElement(self, name): #print "End %s" % name self.soup.handle_endtag(name) def startElementNS(self, nsTuple, nodeName, attrs): # Throw away (ns, nodeName) for now. self.startElement(nodeName, attrs) def endElementNS(self, nsTuple, nodeName): # Throw away (ns, nodeName) for now. self.endElement(nodeName) #handler.endElementNS((ns, node.nodeName), node.nodeName) def startPrefixMapping(self, prefix, nodeValue): # Ignore the prefix for now. pass def endPrefixMapping(self, prefix): # Ignore the prefix for now. # handler.endPrefixMapping(prefix) pass def characters(self, content): self.soup.handle_data(content) def startDocument(self): pass def endDocument(self): pass class HTMLTreeBuilder(TreeBuilder): """This TreeBuilder knows facts about HTML. Such as which tags are empty-element tags. """ preserve_whitespace_tags = HTMLAwareEntitySubstitution.preserve_whitespace_tags empty_element_tags = set([ # These are from HTML5. 'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'keygen', 'link', 'menuitem', 'meta', 'param', 'source', 'track', 'wbr', # These are from earlier versions of HTML and are removed in HTML5. 'basefont', 'bgsound', 'command', 'frame', 'image', 'isindex', 'nextid', 'spacer' ]) # The HTML standard defines these as block-level elements. Beautiful # Soup does not treat these elements differently from other elements, # but it may do so eventually, and this information is available if # you need to use it. block_elements = set(["address", "article", "aside", "blockquote", "canvas", "dd", "div", "dl", "dt", "fieldset", "figcaption", "figure", "footer", "form", "h1", "h2", "h3", "h4", "h5", "h6", "header", "hr", "li", "main", "nav", "noscript", "ol", "output", "p", "pre", "section", "table", "tfoot", "ul", "video"]) # The HTML standard defines these attributes as containing a # space-separated list of values, not a single value. That is, # class="foo bar" means that the 'class' attribute has two values, # 'foo' and 'bar', not the single value 'foo bar'. When we # encounter one of these attributes, we will parse its value into # a list of values if possible. Upon output, the list will be # converted back into a string. cdata_list_attributes = { "*" : ['class', 'accesskey', 'dropzone'], "a" : ['rel', 'rev'], "link" : ['rel', 'rev'], "td" : ["headers"], "th" : ["headers"], "td" : ["headers"], "form" : ["accept-charset"], "object" : ["archive"], # These are HTML5 specific, as are *.accesskey and *.dropzone above. "area" : ["rel"], "icon" : ["sizes"], "iframe" : ["sandbox"], "output" : ["for"], } def set_up_substitutions(self, tag): # We are only interested in tags if tag.name != 'meta': return False http_equiv = tag.get('http-equiv') content = tag.get('content') charset = tag.get('charset') # We are interested in tags that say what encoding the # document was originally in. This means HTML 5-style # tags that provide the "charset" attribute. It also means # HTML 4-style tags that provide the "content" # attribute and have "http-equiv" set to "content-type". # # In both cases we will replace the value of the appropriate # attribute with a standin object that can take on any # encoding. meta_encoding = None if charset is not None: # HTML 5 style: # meta_encoding = charset tag['charset'] = CharsetMetaAttributeValue(charset) elif (content is not None and http_equiv is not None and http_equiv.lower() == 'content-type'): # HTML 4 style: # tag['content'] = ContentMetaAttributeValue(content) return (meta_encoding is not None) def register_treebuilders_from(module): """Copy TreeBuilders from the given module into this module.""" # I'm fairly sure this is not the best way to do this. this_module = sys.modules['bs4.builder'] for name in module.__all__: obj = getattr(module, name) if issubclass(obj, TreeBuilder): setattr(this_module, name, obj) this_module.__all__.append(name) # Register the builder while we're at it. this_module.builder_registry.register(obj) class ParserRejectedMarkup(Exception): pass # Builders are registered in reverse order of priority, so that custom # builder registrations will take precedence. In general, we want lxml # to take precedence over html5lib, because it's faster. And we only # want to use HTMLParser as a last result. from . import _htmlparser register_treebuilders_from(_htmlparser) try: from . import _html5lib register_treebuilders_from(_html5lib) except ImportError: # They don't have html5lib installed. pass try: from . import _lxml register_treebuilders_from(_lxml) except ImportError: # They don't have lxml installed. pass PK! 0A0Abuilder/_html5lib.pynu[# Use of this source code is governed by a BSD-style license that can be # found in the LICENSE file. __all__ = [ 'HTML5TreeBuilder', ] import warnings import re from bs4.builder import ( PERMISSIVE, HTML, HTML_5, HTMLTreeBuilder, ) from bs4.element import ( NamespacedAttribute, whitespace_re, ) import html5lib from html5lib.constants import ( namespaces, prefixes, ) from bs4.element import ( Comment, Doctype, NavigableString, Tag, ) try: # Pre-0.99999999 from html5lib.treebuilders import _base as treebuilder_base new_html5lib = False except ImportError as e: # 0.99999999 and up from html5lib.treebuilders import base as treebuilder_base new_html5lib = True class HTML5TreeBuilder(HTMLTreeBuilder): """Use html5lib to build a tree.""" NAME = "html5lib" features = [NAME, PERMISSIVE, HTML_5, HTML] def prepare_markup(self, markup, user_specified_encoding, document_declared_encoding=None, exclude_encodings=None): # Store the user-specified encoding for use later on. self.user_specified_encoding = user_specified_encoding # document_declared_encoding and exclude_encodings aren't used # ATM because the html5lib TreeBuilder doesn't use # UnicodeDammit. if exclude_encodings: warnings.warn("You provided a value for exclude_encoding, but the html5lib tree builder doesn't support exclude_encoding.") yield (markup, None, None, False) # These methods are defined by Beautiful Soup. def feed(self, markup): if self.soup.parse_only is not None: warnings.warn("You provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.") parser = html5lib.HTMLParser(tree=self.create_treebuilder) extra_kwargs = dict() if not isinstance(markup, str): if new_html5lib: extra_kwargs['override_encoding'] = self.user_specified_encoding else: extra_kwargs['encoding'] = self.user_specified_encoding doc = parser.parse(markup, **extra_kwargs) # Set the character encoding detected by the tokenizer. if isinstance(markup, str): # We need to special-case this because html5lib sets # charEncoding to UTF-8 if it gets Unicode input. doc.original_encoding = None else: original_encoding = parser.tokenizer.stream.charEncoding[0] if not isinstance(original_encoding, str): # In 0.99999999 and up, the encoding is an html5lib # Encoding object. We want to use a string for compatibility # with other tree builders. original_encoding = original_encoding.name doc.original_encoding = original_encoding def create_treebuilder(self, namespaceHTMLElements): self.underlying_builder = TreeBuilderForHtml5lib( namespaceHTMLElements, self.soup) return self.underlying_builder def test_fragment_to_document(self, fragment): """See `TreeBuilder`.""" return '%s' % fragment class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder): def __init__(self, namespaceHTMLElements, soup=None): if soup: self.soup = soup else: from bs4 import BeautifulSoup self.soup = BeautifulSoup("", "html.parser") super(TreeBuilderForHtml5lib, self).__init__(namespaceHTMLElements) def documentClass(self): self.soup.reset() return Element(self.soup, self.soup, None) def insertDoctype(self, token): name = token["name"] publicId = token["publicId"] systemId = token["systemId"] doctype = Doctype.for_name_and_ids(name, publicId, systemId) self.soup.object_was_parsed(doctype) def elementClass(self, name, namespace): tag = self.soup.new_tag(name, namespace) return Element(tag, self.soup, namespace) def commentClass(self, data): return TextNode(Comment(data), self.soup) def fragmentClass(self): from bs4 import BeautifulSoup self.soup = BeautifulSoup("", "html.parser") self.soup.name = "[document_fragment]" return Element(self.soup, self.soup, None) def appendChild(self, node): # XXX This code is not covered by the BS4 tests. self.soup.append(node.element) def getDocument(self): return self.soup def getFragment(self): return treebuilder_base.TreeBuilder.getFragment(self).element def testSerializer(self, element): from bs4 import BeautifulSoup rv = [] doctype_re = re.compile(r'^(.*?)(?: PUBLIC "(.*?)"(?: "(.*?)")?| SYSTEM "(.*?)")?$') def serializeElement(element, indent=0): if isinstance(element, BeautifulSoup): pass if isinstance(element, Doctype): m = doctype_re.match(element) if m: name = m.group(1) if m.lastindex > 1: publicId = m.group(2) or "" systemId = m.group(3) or m.group(4) or "" rv.append("""|%s""" % (' ' * indent, name, publicId, systemId)) else: rv.append("|%s" % (' ' * indent, name)) else: rv.append("|%s" % (' ' * indent,)) elif isinstance(element, Comment): rv.append("|%s" % (' ' * indent, element)) elif isinstance(element, NavigableString): rv.append("|%s\"%s\"" % (' ' * indent, element)) else: if element.namespace: name = "%s %s" % (prefixes[element.namespace], element.name) else: name = element.name rv.append("|%s<%s>" % (' ' * indent, name)) if element.attrs: attributes = [] for name, value in list(element.attrs.items()): if isinstance(name, NamespacedAttribute): name = "%s %s" % (prefixes[name.namespace], name.name) if isinstance(value, list): value = " ".join(value) attributes.append((name, value)) for name, value in sorted(attributes): rv.append('|%s%s="%s"' % (' ' * (indent + 2), name, value)) indent += 2 for child in element.children: serializeElement(child, indent) serializeElement(element, 0) return "\n".join(rv) class AttrList(object): def __init__(self, element): self.element = element self.attrs = dict(self.element.attrs) def __iter__(self): return list(self.attrs.items()).__iter__() def __setitem__(self, name, value): # If this attribute is a multi-valued attribute for this element, # turn its value into a list. list_attr = HTML5TreeBuilder.cdata_list_attributes if (name in list_attr['*'] or (self.element.name in list_attr and name in list_attr[self.element.name])): # A node that is being cloned may have already undergone # this procedure. if not isinstance(value, list): value = whitespace_re.split(value) self.element[name] = value def items(self): return list(self.attrs.items()) def keys(self): return list(self.attrs.keys()) def __len__(self): return len(self.attrs) def __getitem__(self, name): return self.attrs[name] def __contains__(self, name): return name in list(self.attrs.keys()) class Element(treebuilder_base.Node): def __init__(self, element, soup, namespace): treebuilder_base.Node.__init__(self, element.name) self.element = element self.soup = soup self.namespace = namespace def appendChild(self, node): string_child = child = None if isinstance(node, str): # Some other piece of code decided to pass in a string # instead of creating a TextElement object to contain the # string. string_child = child = node elif isinstance(node, Tag): # Some other piece of code decided to pass in a Tag # instead of creating an Element object to contain the # Tag. child = node elif node.element.__class__ == NavigableString: string_child = child = node.element node.parent = self else: child = node.element node.parent = self if not isinstance(child, str) and child.parent is not None: node.element.extract() if (string_child and self.element.contents and self.element.contents[-1].__class__ == NavigableString): # We are appending a string onto another string. # TODO This has O(n^2) performance, for input like # "aaa..." old_element = self.element.contents[-1] new_element = self.soup.new_string(old_element + string_child) old_element.replace_with(new_element) self.soup._most_recent_element = new_element else: if isinstance(node, str): # Create a brand new NavigableString from this string. child = self.soup.new_string(node) # Tell Beautiful Soup to act as if it parsed this element # immediately after the parent's last descendant. (Or # immediately after the parent, if it has no children.) if self.element.contents: most_recent_element = self.element._last_descendant(False) elif self.element.next_element is not None: # Something from further ahead in the parse tree is # being inserted into this earlier element. This is # very annoying because it means an expensive search # for the last element in the tree. most_recent_element = self.soup._last_descendant() else: most_recent_element = self.element self.soup.object_was_parsed( child, parent=self.element, most_recent_element=most_recent_element) def getAttributes(self): if isinstance(self.element, Comment): return {} return AttrList(self.element) def setAttributes(self, attributes): if attributes is not None and len(attributes) > 0: converted_attributes = [] for name, value in list(attributes.items()): if isinstance(name, tuple): new_name = NamespacedAttribute(*name) del attributes[name] attributes[new_name] = value self.soup.builder._replace_cdata_list_attribute_values( self.name, attributes) for name, value in list(attributes.items()): self.element[name] = value # The attributes may contain variables that need substitution. # Call set_up_substitutions manually. # # The Tag constructor called this method when the Tag was created, # but we just set/changed the attributes, so call it again. self.soup.builder.set_up_substitutions(self.element) attributes = property(getAttributes, setAttributes) def insertText(self, data, insertBefore=None): text = TextNode(self.soup.new_string(data), self.soup) if insertBefore: self.insertBefore(text, insertBefore) else: self.appendChild(text) def insertBefore(self, node, refNode): index = self.element.index(refNode.element) if (node.element.__class__ == NavigableString and self.element.contents and self.element.contents[index-1].__class__ == NavigableString): # (See comments in appendChild) old_node = self.element.contents[index-1] new_str = self.soup.new_string(old_node + node.element) old_node.replace_with(new_str) else: self.element.insert(index, node.element) node.parent = self def removeChild(self, node): node.element.extract() def reparentChildren(self, new_parent): """Move all of this tag's children into another tag.""" # print "MOVE", self.element.contents # print "FROM", self.element # print "TO", new_parent.element element = self.element new_parent_element = new_parent.element # Determine what this tag's next_element will be once all the children # are removed. final_next_element = element.next_sibling new_parents_last_descendant = new_parent_element._last_descendant(False, False) if len(new_parent_element.contents) > 0: # The new parent already contains children. We will be # appending this tag's children to the end. new_parents_last_child = new_parent_element.contents[-1] new_parents_last_descendant_next_element = new_parents_last_descendant.next_element else: # The new parent contains no children. new_parents_last_child = None new_parents_last_descendant_next_element = new_parent_element.next_element to_append = element.contents if len(to_append) > 0: # Set the first child's previous_element and previous_sibling # to elements within the new parent first_child = to_append[0] if new_parents_last_descendant: first_child.previous_element = new_parents_last_descendant else: first_child.previous_element = new_parent_element first_child.previous_sibling = new_parents_last_child if new_parents_last_descendant: new_parents_last_descendant.next_element = first_child else: new_parent_element.next_element = first_child if new_parents_last_child: new_parents_last_child.next_sibling = first_child # Find the very last element being moved. It is now the # parent's last descendant. It has no .next_sibling and # its .next_element is whatever the previous last # descendant had. last_childs_last_descendant = to_append[-1]._last_descendant(False, True) last_childs_last_descendant.next_element = new_parents_last_descendant_next_element if new_parents_last_descendant_next_element: # TODO: This code has no test coverage and I'm not sure # how to get html5lib to go through this path, but it's # just the other side of the previous line. new_parents_last_descendant_next_element.previous_element = last_childs_last_descendant last_childs_last_descendant.next_sibling = None for child in to_append: child.parent = new_parent_element new_parent_element.contents.append(child) # Now that this element has no children, change its .next_element. element.contents = [] element.next_element = final_next_element # print "DONE WITH MOVE" # print "FROM", self.element # print "TO", new_parent_element def cloneNode(self): tag = self.soup.new_tag(self.element.name, self.namespace) node = Element(tag, self.soup, self.namespace) for key,value in self.attributes: node.attributes[key] = value return node def hasContent(self): return self.element.contents def getNameTuple(self): if self.namespace == None: return namespaces["html"], self.name else: return self.namespace, self.name nameTuple = property(getNameTuple) class TextNode(Element): def __init__(self, element, soup): treebuilder_base.Node.__init__(self, None) self.element = element self.soup = soup def cloneNode(self): raise NotImplementedError PK!݆33builder/_htmlparser.pynu[# encoding: utf-8 """Use the HTMLParser library to parse HTML files that aren't too bad.""" # Use of this source code is governed by a BSD-style license that can be # found in the LICENSE file. __all__ = [ 'HTMLParserTreeBuilder', ] from html.parser import HTMLParser try: from html.parser import HTMLParseError except ImportError as e: # HTMLParseError is removed in Python 3.5. Since it can never be # thrown in 3.5, we can just define our own class as a placeholder. class HTMLParseError(Exception): pass import sys import warnings # Starting in Python 3.2, the HTMLParser constructor takes a 'strict' # argument, which we'd like to set to False. Unfortunately, # http://bugs.python.org/issue13273 makes strict=True a better bet # before Python 3.2.3. # # At the end of this file, we monkeypatch HTMLParser so that # strict=True works well on Python 3.2.2. major, minor, release = sys.version_info[:3] CONSTRUCTOR_TAKES_STRICT = major == 3 and minor == 2 and release >= 3 CONSTRUCTOR_STRICT_IS_DEPRECATED = major == 3 and minor == 3 CONSTRUCTOR_TAKES_CONVERT_CHARREFS = major == 3 and minor >= 4 from bs4.element import ( CData, Comment, Declaration, Doctype, ProcessingInstruction, ) from bs4.dammit import EntitySubstitution, UnicodeDammit from bs4.builder import ( HTML, HTMLTreeBuilder, STRICT, ) HTMLPARSER = 'html.parser' class BeautifulSoupHTMLParser(HTMLParser): def __init__(self, *args, **kwargs): HTMLParser.__init__(self, *args, **kwargs) # Keep a list of empty-element tags that were encountered # without an explicit closing tag. If we encounter a closing tag # of this type, we'll associate it with one of those entries. # # This isn't a stack because we don't care about the # order. It's a list of closing tags we've already handled and # will ignore, assuming they ever show up. self.already_closed_empty_element = [] def error(self, msg): """In Python 3, HTMLParser subclasses must implement error(), although this requirement doesn't appear to be documented. In Python 2, HTMLParser implements error() as raising an exception. In any event, this method is called only on very strange markup and our best strategy is to pretend it didn't happen and keep going. """ warnings.warn(msg) def handle_startendtag(self, name, attrs): # This is only called when the markup looks like # . # is_startend() tells handle_starttag not to close the tag # just because its name matches a known empty-element tag. We # know that this is an empty-element tag and we want to call # handle_endtag ourselves. tag = self.handle_starttag(name, attrs, handle_empty_element=False) self.handle_endtag(name) def handle_starttag(self, name, attrs, handle_empty_element=True): # XXX namespace attr_dict = {} for key, value in attrs: # Change None attribute values to the empty string # for consistency with the other tree builders. if value is None: value = '' attr_dict[key] = value attrvalue = '""' #print "START", name tag = self.soup.handle_starttag(name, None, None, attr_dict) if tag and tag.is_empty_element and handle_empty_element: # Unlike other parsers, html.parser doesn't send separate end tag # events for empty-element tags. (It's handled in # handle_startendtag, but only if the original markup looked like # .) # # So we need to call handle_endtag() ourselves. Since we # know the start event is identical to the end event, we # don't want handle_endtag() to cross off any previous end # events for tags of this name. self.handle_endtag(name, check_already_closed=False) # But we might encounter an explicit closing tag for this tag # later on. If so, we want to ignore it. self.already_closed_empty_element.append(name) def handle_endtag(self, name, check_already_closed=True): #print "END", name if check_already_closed and name in self.already_closed_empty_element: # This is a redundant end tag for an empty-element tag. # We've already called handle_endtag() for it, so just # check it off the list. # print "ALREADY CLOSED", name self.already_closed_empty_element.remove(name) else: self.soup.handle_endtag(name) def handle_data(self, data): self.soup.handle_data(data) def handle_charref(self, name): # XXX workaround for a bug in HTMLParser. Remove this once # it's fixed in all supported versions. # http://bugs.python.org/issue13633 if name.startswith('x'): real_name = int(name.lstrip('x'), 16) elif name.startswith('X'): real_name = int(name.lstrip('X'), 16) else: real_name = int(name) data = None if real_name < 256: # HTML numeric entities are supposed to reference Unicode # code points, but sometimes they reference code points in # some other encoding (ahem, Windows-1252). E.g. “ # instead of É for LEFT DOUBLE QUOTATION MARK. This # code tries to detect this situation and compensate. for encoding in (self.soup.original_encoding, 'windows-1252'): if not encoding: continue try: data = bytearray([real_name]).decode(encoding) except UnicodeDecodeError as e: pass if not data: try: data = chr(real_name) except (ValueError, OverflowError) as e: pass data = data or "\N{REPLACEMENT CHARACTER}" self.handle_data(data) def handle_entityref(self, name): character = EntitySubstitution.HTML_ENTITY_TO_CHARACTER.get(name) if character is not None: data = character else: # If this were XML, it would be ambiguous whether "&foo" # was an character entity reference with a missing # semicolon or the literal string "&foo". Since this is # HTML, we have a complete list of all character entity references, # and this one wasn't found, so assume it's the literal string "&foo". data = "&%s" % name self.handle_data(data) def handle_comment(self, data): self.soup.endData() self.soup.handle_data(data) self.soup.endData(Comment) def handle_decl(self, data): self.soup.endData() if data.startswith("DOCTYPE "): data = data[len("DOCTYPE "):] elif data == 'DOCTYPE': # i.e. "" data = '' self.soup.handle_data(data) self.soup.endData(Doctype) def unknown_decl(self, data): if data.upper().startswith('CDATA['): cls = CData data = data[len('CDATA['):] else: cls = Declaration self.soup.endData() self.soup.handle_data(data) self.soup.endData(cls) def handle_pi(self, data): self.soup.endData() self.soup.handle_data(data) self.soup.endData(ProcessingInstruction) class HTMLParserTreeBuilder(HTMLTreeBuilder): is_xml = False picklable = True NAME = HTMLPARSER features = [NAME, HTML, STRICT] def __init__(self, *args, **kwargs): if CONSTRUCTOR_TAKES_STRICT and not CONSTRUCTOR_STRICT_IS_DEPRECATED: kwargs['strict'] = False if CONSTRUCTOR_TAKES_CONVERT_CHARREFS: kwargs['convert_charrefs'] = False self.parser_args = (args, kwargs) def prepare_markup(self, markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None): """ :return: A 4-tuple (markup, original encoding, encoding declared within markup, whether any characters had to be replaced with REPLACEMENT CHARACTER). """ if isinstance(markup, str): yield (markup, None, None, False) return try_encodings = [user_specified_encoding, document_declared_encoding] dammit = UnicodeDammit(markup, try_encodings, is_html=True, exclude_encodings=exclude_encodings) yield (dammit.markup, dammit.original_encoding, dammit.declared_html_encoding, dammit.contains_replacement_characters) def feed(self, markup): args, kwargs = self.parser_args parser = BeautifulSoupHTMLParser(*args, **kwargs) parser.soup = self.soup try: parser.feed(markup) parser.close() except HTMLParseError as e: warnings.warn(RuntimeWarning( "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) raise e parser.already_closed_empty_element = [] # Patch 3.2 versions of HTMLParser earlier than 3.2.3 to use some # 3.2.3 code. This ensures they don't treat markup like

as a # string. # # XXX This code can be removed once most Python 3 users are on 3.2.3. if major == 3 and minor == 2 and not CONSTRUCTOR_TAKES_STRICT: import re attrfind_tolerant = re.compile( r'\s*((?<=[\'"\s])[^\s/>][^\s/=>]*)(\s*=+\s*' r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?') HTMLParserTreeBuilder.attrfind_tolerant = attrfind_tolerant locatestarttagend = re.compile(r""" <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace """, re.VERBOSE) BeautifulSoupHTMLParser.locatestarttagend = locatestarttagend from html.parser import tagfind, attrfind def parse_starttag(self, i): self.__starttag_text = None endpos = self.check_for_whole_start_tag(i) if endpos < 0: return endpos rawdata = self.rawdata self.__starttag_text = rawdata[i:endpos] # Now parse the data between i+1 and j into a tag and attrs attrs = [] match = tagfind.match(rawdata, i+1) assert match, 'unexpected call to parse_starttag()' k = match.end() self.lasttag = tag = rawdata[i+1:k].lower() while k < endpos: if self.strict: m = attrfind.match(rawdata, k) else: m = attrfind_tolerant.match(rawdata, k) if not m: break attrname, rest, attrvalue = m.group(1, 2, 3) if not rest: attrvalue = None elif attrvalue[:1] == '\'' == attrvalue[-1:] or \ attrvalue[:1] == '"' == attrvalue[-1:]: attrvalue = attrvalue[1:-1] if attrvalue: attrvalue = self.unescape(attrvalue) attrs.append((attrname.lower(), attrvalue)) k = m.end() end = rawdata[k:endpos].strip() if end not in (">", "/>"): lineno, offset = self.getpos() if "\n" in self.__starttag_text: lineno = lineno + self.__starttag_text.count("\n") offset = len(self.__starttag_text) \ - self.__starttag_text.rfind("\n") else: offset = offset + len(self.__starttag_text) if self.strict: self.error("junk characters in start tag: %r" % (rawdata[k:endpos][:20],)) self.handle_data(rawdata[i:endpos]) return endpos if end.endswith('/>'): # XHTML-style empty tag: self.handle_startendtag(tag, attrs) else: self.handle_starttag(tag, attrs) if tag in self.CDATA_CONTENT_ELEMENTS: self.set_cdata_mode(tag) return endpos def set_cdata_mode(self, elem): self.cdata_elem = elem.lower() self.interesting = re.compile(r'' % self.cdata_elem, re.I) BeautifulSoupHTMLParser.parse_starttag = parse_starttag BeautifulSoupHTMLParser.set_cdata_mode = set_cdata_mode CONSTRUCTOR_TAKES_STRICT = True PK!:}%}%builder/_lxml.pynu[# Use of this source code is governed by a BSD-style license that can be # found in the LICENSE file. __all__ = [ 'LXMLTreeBuilderForXML', 'LXMLTreeBuilder', ] try: from collections.abc import Callable # Python 3.6 except ImportError as e: from collections import Callable from io import BytesIO from io import StringIO from lxml import etree from bs4.element import ( Comment, Doctype, NamespacedAttribute, ProcessingInstruction, XMLProcessingInstruction, ) from bs4.builder import ( FAST, HTML, HTMLTreeBuilder, PERMISSIVE, ParserRejectedMarkup, TreeBuilder, XML) from bs4.dammit import EncodingDetector LXML = 'lxml' class LXMLTreeBuilderForXML(TreeBuilder): DEFAULT_PARSER_CLASS = etree.XMLParser is_xml = True processing_instruction_class = XMLProcessingInstruction NAME = "lxml-xml" ALTERNATE_NAMES = ["xml"] # Well, it's permissive by XML parser standards. features = [NAME, LXML, XML, FAST, PERMISSIVE] CHUNK_SIZE = 512 # This namespace mapping is specified in the XML Namespace # standard. DEFAULT_NSMAPS = {'http://www.w3.org/XML/1998/namespace' : "xml"} def default_parser(self, encoding): # This can either return a parser object or a class, which # will be instantiated with default arguments. if self._default_parser is not None: return self._default_parser return etree.XMLParser( target=self, strip_cdata=False, recover=True, encoding=encoding) def parser_for(self, encoding): # Use the default parser. parser = self.default_parser(encoding) if isinstance(parser, Callable): # Instantiate the parser with default arguments parser = parser(target=self, strip_cdata=False, encoding=encoding) return parser def __init__(self, parser=None, empty_element_tags=None): # TODO: Issue a warning if parser is present but not a # callable, since that means there's no way to create new # parsers for different encodings. self._default_parser = parser if empty_element_tags is not None: self.empty_element_tags = set(empty_element_tags) self.soup = None self.nsmaps = [self.DEFAULT_NSMAPS] def _getNsTag(self, tag): # Split the namespace URL out of a fully-qualified lxml tag # name. Copied from lxml's src/lxml/sax.py. if tag[0] == '{': return tuple(tag[1:].split('}', 1)) else: return (None, tag) def prepare_markup(self, markup, user_specified_encoding=None, exclude_encodings=None, document_declared_encoding=None): """ :yield: A series of 4-tuples. (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for parsing the document. """ # Instead of using UnicodeDammit to convert the bytestring to # Unicode using different encodings, use EncodingDetector to # iterate over the encodings, and tell lxml to try to parse # the document as each one in turn. is_html = not self.is_xml if is_html: self.processing_instruction_class = ProcessingInstruction else: self.processing_instruction_class = XMLProcessingInstruction if isinstance(markup, str): # We were given Unicode. Maybe lxml can parse Unicode on # this system? yield markup, None, document_declared_encoding, False if isinstance(markup, str): # No, apparently not. Convert the Unicode to UTF-8 and # tell lxml to parse it as UTF-8. yield (markup.encode("utf8"), "utf8", document_declared_encoding, False) try_encodings = [user_specified_encoding, document_declared_encoding] detector = EncodingDetector( markup, try_encodings, is_html, exclude_encodings) for encoding in detector.encodings: yield (detector.markup, encoding, document_declared_encoding, False) def feed(self, markup): if isinstance(markup, bytes): markup = BytesIO(markup) elif isinstance(markup, str): markup = StringIO(markup) # Call feed() at least once, even if the markup is empty, # or the parser won't be initialized. data = markup.read(self.CHUNK_SIZE) try: self.parser = self.parser_for(self.soup.original_encoding) self.parser.feed(data) while len(data) != 0: # Now call feed() on the rest of the data, chunk by chunk. data = markup.read(self.CHUNK_SIZE) if len(data) != 0: self.parser.feed(data) self.parser.close() except (UnicodeDecodeError, LookupError, etree.ParserError) as e: raise ParserRejectedMarkup(str(e)) def close(self): self.nsmaps = [self.DEFAULT_NSMAPS] def start(self, name, attrs, nsmap={}): # Make sure attrs is a mutable dict--lxml may send an immutable dictproxy. attrs = dict(attrs) nsprefix = None # Invert each namespace map as it comes in. if len(nsmap) == 0 and len(self.nsmaps) > 1: # There are no new namespaces for this tag, but # non-default namespaces are in play, so we need a # separate tag stack to know when they end. self.nsmaps.append(None) elif len(nsmap) > 0: # A new namespace mapping has come into play. inverted_nsmap = dict((value, key) for key, value in list(nsmap.items())) self.nsmaps.append(inverted_nsmap) # Also treat the namespace mapping as a set of attributes on the # tag, so we can recreate it later. attrs = attrs.copy() for prefix, namespace in list(nsmap.items()): attribute = NamespacedAttribute( "xmlns", prefix, "http://www.w3.org/2000/xmlns/") attrs[attribute] = namespace # Namespaces are in play. Find any attributes that came in # from lxml with namespaces attached to their names, and # turn then into NamespacedAttribute objects. new_attrs = {} for attr, value in list(attrs.items()): namespace, attr = self._getNsTag(attr) if namespace is None: new_attrs[attr] = value else: nsprefix = self._prefix_for_namespace(namespace) attr = NamespacedAttribute(nsprefix, attr, namespace) new_attrs[attr] = value attrs = new_attrs namespace, name = self._getNsTag(name) nsprefix = self._prefix_for_namespace(namespace) self.soup.handle_starttag(name, namespace, nsprefix, attrs) def _prefix_for_namespace(self, namespace): """Find the currently active prefix for the given namespace.""" if namespace is None: return None for inverted_nsmap in reversed(self.nsmaps): if inverted_nsmap is not None and namespace in inverted_nsmap: return inverted_nsmap[namespace] return None def end(self, name): self.soup.endData() completed_tag = self.soup.tagStack[-1] namespace, name = self._getNsTag(name) nsprefix = None if namespace is not None: for inverted_nsmap in reversed(self.nsmaps): if inverted_nsmap is not None and namespace in inverted_nsmap: nsprefix = inverted_nsmap[namespace] break self.soup.handle_endtag(name, nsprefix) if len(self.nsmaps) > 1: # This tag, or one of its parents, introduced a namespace # mapping, so pop it off the stack. self.nsmaps.pop() def pi(self, target, data): self.soup.endData() self.soup.handle_data(target + ' ' + data) self.soup.endData(self.processing_instruction_class) def data(self, content): self.soup.handle_data(content) def doctype(self, name, pubid, system): self.soup.endData() doctype = Doctype.for_name_and_ids(name, pubid, system) self.soup.object_was_parsed(doctype) def comment(self, content): "Handle comments as Comment objects." self.soup.endData() self.soup.handle_data(content) self.soup.endData(Comment) def test_fragment_to_document(self, fragment): """See `TreeBuilder`.""" return '\n%s' % fragment class LXMLTreeBuilder(HTMLTreeBuilder, LXMLTreeBuilderForXML): NAME = LXML ALTERNATE_NAMES = ["lxml-html"] features = ALTERNATE_NAMES + [NAME, HTML, FAST, PERMISSIVE] is_xml = False processing_instruction_class = ProcessingInstruction def default_parser(self, encoding): return etree.HTMLParser def feed(self, markup): encoding = self.soup.original_encoding try: self.parser = self.parser_for(encoding) self.parser.feed(markup) self.parser.close() except (UnicodeDecodeError, LookupError, etree.ParserError) as e: raise ParserRejectedMarkup(str(e)) def test_fragment_to_document(self, fragment): """See `TreeBuilder`.""" return '%s' % fragment PK!/tests/__pycache__/__init__.cpython-36.opt-1.pycnu[3 CN@sdZdS)zThe beautifulsoup tests.N)__doc__rr/usr/lib/python3.6/__init__.pysPK!)tests/__pycache__/__init__.cpython-36.pycnu[3 CN@sdZdS)zThe beautifulsoup tests.N)__doc__rr/usr/lib/python3.6/__init__.pysPK! ,kk<tests/__pycache__/test_builder_registry.cpython-36.opt-1.pycnu[3 cT@sdZddlZddlZddlmZddlmZmZm Z yddlm Z dZ Wne k r`dZ YnXyddlm Z mZdZWne k rdZYnXGd d d ejZGd d d ejZdS) zTests of the builder registry.N) BeautifulSoup)builder_registryHTMLParserTreeBuilderTreeBuilderRegistry)HTML5TreeBuilderTF)LXMLTreeBuilderForXMLLXMLTreeBuilderc@s0eZdZdZddZddZddZdd Zd S) BuiltInRegistryTestz@Test the built-in registry with the default builders registered.cCs`tr|jtjddttr0|jtjddt|jtjddttr\|jtjddtdS)NfasthtmlZ permissivexmlstricthtml5lib) LXML_PRESENT assertEqualregistrylookuprrrHTML5LIB_PRESENTr)selfr+/usr/lib/python3.6/test_builder_registry.pytest_combination sz$BuiltInRegistryTest.test_combinationcCsjtr*|jtjdt|jtjdtn<|jtjddtrT|jtjdtn|jtjdtdS)Nr r ) rrrrrrrrr)rrrrtest_lookup_by_markup_type.sz.BuiltInRegistryTest.test_lookup_by_markup_typecCsXtr,|jtjddt|jtjddttrB|jtjdt|jtjdtdS)NZlxmlr r rz html.parser) rrrrrrrrr)rrrrtest_named_library9s  z&BuiltInRegistryTest.test_named_libraryc CsJtjdd"}tdddtdddgdWdQRX|jttddddS)NT)recordr )featuresr zno-such-feature)warningscatch_warningsrZ assertRaises ValueError)rwrrr*test_beautifulsoup_constructor_does_lookupFs  z>BuiltInRegistryTest.test_beautifulsoup_constructor_does_lookupN)__name__ __module__ __qualname____doc__rrrr!rrrrr s   r c@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS) RegistryTestz.Test the TreeBuilderRegistry class in general.cCs t|_dS)N)rr)rrrrsetUpYszRegistryTest.setUpcGs,tddj|tfd|i}|jj||S)NZBuilder__r)typejoinobjectrregister)rZ feature_listclsrrrbuilder_for_features\s z!RegistryTest.builder_for_featurescCs2|j}|j|jjdd|j|jj|dS)Nfoo)r.rrr)rbuilderrrrtest_register_with_no_featurescsz+RegistryTest.test_register_with_no_featurescCs8|jdd}|j|jjd||j|jjd|dS)Nr/bar)r.rrr)rr0rrr0test_register_with_features_makes_lookup_succeedns z=RegistryTest.test_register_with_features_makes_lookup_succeedcCs$|jdd}|j|jjdddS)Nr/r2baz)r.rrr)rr0rrr4test_lookup_fails_when_no_builder_implements_featuress zARegistryTest.test_lookup_fails_when_no_builder_implements_featurecCs*|jd}|jd}|j|jj|dS)Nr/r2)r.rrr)rbuilder1builder2rrrCtest_lookup_gets_most_recent_registration_when_no_feature_specifiedws  zPRegistryTest.test_lookup_gets_most_recent_registration_when_no_feature_specifiedcCs|j|jjddS)N)rrr)rrrr2test_lookup_fails_when_no_tree_builders_registered|sz?RegistryTest.test_lookup_fails_when_no_tree_builders_registeredcCsv|jd}|jd}|jddd}|jddd}|jd}|jd}|j|jjdd||j|jjddd|dS)Nr/r2r4Zquux)r.rrr)rZhas_oneZ has_the_otherZhas_both_earlyZ has_both_lateZ lacks_onerrrs     9PK! ,kk6tests/__pycache__/test_builder_registry.cpython-36.pycnu[3 cT@sdZddlZddlZddlmZddlmZmZm Z yddlm Z dZ Wne k r`dZ YnXyddlm Z mZdZWne k rdZYnXGd d d ejZGd d d ejZdS) zTests of the builder registry.N) BeautifulSoup)builder_registryHTMLParserTreeBuilderTreeBuilderRegistry)HTML5TreeBuilderTF)LXMLTreeBuilderForXMLLXMLTreeBuilderc@s0eZdZdZddZddZddZdd Zd S) BuiltInRegistryTestz@Test the built-in registry with the default builders registered.cCs`tr|jtjddttr0|jtjddt|jtjddttr\|jtjddtdS)NfasthtmlZ permissivexmlstricthtml5lib) LXML_PRESENT assertEqualregistrylookuprrrHTML5LIB_PRESENTr)selfr+/usr/lib/python3.6/test_builder_registry.pytest_combination sz$BuiltInRegistryTest.test_combinationcCsjtr*|jtjdt|jtjdtn<|jtjddtrT|jtjdtn|jtjdtdS)Nr r ) rrrrrrrrr)rrrrtest_lookup_by_markup_type.sz.BuiltInRegistryTest.test_lookup_by_markup_typecCsXtr,|jtjddt|jtjddttrB|jtjdt|jtjdtdS)NZlxmlr r rz html.parser) rrrrrrrrr)rrrrtest_named_library9s  z&BuiltInRegistryTest.test_named_libraryc CsJtjdd"}tdddtdddgdWdQRX|jttddddS)NT)recordr )featuresr zno-such-feature)warningscatch_warningsrZ assertRaises ValueError)rwrrr*test_beautifulsoup_constructor_does_lookupFs  z>BuiltInRegistryTest.test_beautifulsoup_constructor_does_lookupN)__name__ __module__ __qualname____doc__rrrr!rrrrr s   r c@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS) RegistryTestz.Test the TreeBuilderRegistry class in general.cCs t|_dS)N)rr)rrrrsetUpYszRegistryTest.setUpcGs,tddj|tfd|i}|jj||S)NZBuilder__r)typejoinobjectrregister)rZ feature_listclsrrrbuilder_for_features\s z!RegistryTest.builder_for_featurescCs2|j}|j|jjdd|j|jj|dS)Nfoo)r.rrr)rbuilderrrrtest_register_with_no_featurescsz+RegistryTest.test_register_with_no_featurescCs8|jdd}|j|jjd||j|jjd|dS)Nr/bar)r.rrr)rr0rrr0test_register_with_features_makes_lookup_succeedns z=RegistryTest.test_register_with_features_makes_lookup_succeedcCs$|jdd}|j|jjdddS)Nr/r2baz)r.rrr)rr0rrr4test_lookup_fails_when_no_builder_implements_featuress zARegistryTest.test_lookup_fails_when_no_builder_implements_featurecCs*|jd}|jd}|j|jj|dS)Nr/r2)r.rrr)rbuilder1builder2rrrCtest_lookup_gets_most_recent_registration_when_no_feature_specifiedws  zPRegistryTest.test_lookup_gets_most_recent_registration_when_no_feature_specifiedcCs|j|jjddS)N)rrr)rrrr2test_lookup_fails_when_no_tree_builders_registered|sz?RegistryTest.test_lookup_fails_when_no_tree_builders_registeredcCsv|jd}|jd}|jddd}|jddd}|jd}|jd}|j|jjdd||j|jjddd|dS)Nr/r2r4Zquux)r.rrr)rZhas_oneZ has_the_otherZhas_both_earlyZ has_both_lateZ lacks_onerrrs     9PK!= uu0tests/__pycache__/test_docs.cpython-36.opt-1.pycnu[3 O+@sDdZeZdgZddlZddlZddlZddlZejej Bej BZ dS)zTest harness for doctests.Zadditional_testsN) __doc__typeZ __metaclass____all__atexitZdoctestosZunittestELLIPSISZNORMALIZE_WHITESPACEZ REPORT_NDIFFZ DOCTEST_FLAGSrr/usr/lib/python3.6/test_docs.pys PK!= uu*tests/__pycache__/test_docs.cpython-36.pycnu[3 O+@sDdZeZdgZddlZddlZddlZddlZejej Bej BZ dS)zTest harness for doctests.Zadditional_testsN) __doc__typeZ __metaclass____all__atexitZdoctestosZunittestELLIPSISZNORMALIZE_WHITESPACEZ REPORT_NDIFFZ DOCTEST_FLAGSrr/usr/lib/python3.6/test_docs.pys PK!ZZ4tests/__pycache__/test_html5lib.cpython-36.opt-1.pycnu[3 6]+@sdZddlZyddlmZdZWn&ek rFZz dZWYddZ[XnXddlmZddl m Z m Z m Z e e dGd d d e e Z dS) zDTests to ensure that the html5lib tree builder generates good trees.N)HTML5TreeBuilderTF) SoupStrainer)HTML5TreeBuilderSmokeTestSoupTestskipIfz?html5lib seems not to be present, not testing its tree builder.c@sleZdZdZeddZddZddZdd Zd d Z d d Z ddZ ddZ ddZ ddZddZdS)HTML5LibBuilderSmokeTestz"See ``HTML5TreeBuilderSmokeTest``.cCstS)N)r)selfr #/usr/lib/python3.6/test_html5lib.pydefault_buildersz(HTML5LibBuilderSmokeTest.default_builderc Csdtd}d}tjdd}|j||d}WdQRX|j|j|j||jdt|dj kdS)Nbz

A bold statement.

T)record)Z parse_onlyz4the html5lib tree builder doesn't support parse_onlyr) rwarningscatch_warningssoup assertEqualdecodeZ document_forZ assertTruestrmessage)rZstrainermarkupwrr r r test_soupstrainersz*HTML5LibBuilderSmokeTest.test_soupstrainercCsd}|j|d|jddS)z8html5lib inserts tags where other parsers don't.z[z
Here's another table:
foo
Here's another table:
foo
z{
Foo
Bar
Baz
N)ZassertSoupEquals)rrr r r test_correctly_nested_tables(s z5HTML5LibBuilderSmokeTest.test_correctly_nested_tablescCs$d}|j|}|jd|jjdS)Nzy

foo

s

foo

)rrpencode)rrrr r r (test_xml_declaration_followed_by_doctype<s  zAHTML5LibBuilderSmokeTest.test_xml_declaration_followed_by_doctypecCs:d}|j|}|jd|jj|jdt|jddS)Nz%

foo

bar

zD

foo

bar

r)rrbodyrlenfind_all)rrrr r r test_reparented_markupJs z/HTML5LibBuilderSmokeTest.test_reparented_markupcCs:d}|j|}|jd|jj|jdt|jddS)Nz&

foo

bar

zE

foo

bar

rr)rrrrrr)rrrr r r +test_reparented_markup_ends_with_whitespaceQs zDHTML5LibBuilderSmokeTest.test_reparented_markup_ends_with_whitespacecCs0d}|j|}|jdd\}}|jd\}}dS)zVerify that we keep the two whitespace nodes in this document distinct when reparenting the adjacent tags. z,
 )stringZtbodyN)rr)rrrZspace1Zspace2Ztbody1Ztbody2r r r aftermath

aftermath

target)r#Z aftermath)rnoscriptrZ next_elementfindrZprevious_element)rrrr(r%Zfinal_aftermathr r r *test_reparented_markup_containing_childrenbs  zCHTML5LibBuilderSmokeTest.test_reparented_markup_containing_childrencCsd}|j|}dS)z(Processing instructions become comments.sN)r)rrrr r r test_processing_instructionrs z4HTML5LibBuilderSmokeTest.test_processing_instructioncCs,d}|j|}|jd\}}|j||dS)Ns

a)rrr)rrrZa1Za2r r r test_cloned_multivalue_nodexs   z4HTML5LibBuilderSmokeTest.test_cloned_multivalue_nodecCs$d}|j|}|jd|jjdS)NsAz>A
)rrrr)rrrr r r test_foster_parentings z.HTML5LibBuilderSmokeTest.test_foster_parentingN)__name__ __module__ __qualname____doc__propertyr rrrr r!r$r*r+r-r.r r r r rs   r)r2rZ bs4.builderrZHTML5LIB_PRESENT ImportErroreZ bs4.elementrZ bs4.testingrrrrr r r r s  PK!0e,.tests/__pycache__/test_html5lib.cpython-36.pycnu[3 6]+@sdZddlZyddlmZdZWn&ek rFZz dZWYddZ[XnXddlmZddl m Z m Z m Z e e dGd d d e e Z dS) zDTests to ensure that the html5lib tree builder generates good trees.N)HTML5TreeBuilderTF) SoupStrainer)HTML5TreeBuilderSmokeTestSoupTestskipIfz?html5lib seems not to be present, not testing its tree builder.c@sleZdZdZeddZddZddZdd Zd d Z d d Z ddZ ddZ ddZ ddZddZdS)HTML5LibBuilderSmokeTestz"See ``HTML5TreeBuilderSmokeTest``.cCstS)N)r)selfr #/usr/lib/python3.6/test_html5lib.pydefault_buildersz(HTML5LibBuilderSmokeTest.default_builderc Csdtd}d}tjdd}|j||d}WdQRX|j|j|j||jdt|dj kdS)Nbz

A bold statement.

T)record)Z parse_onlyz4the html5lib tree builder doesn't support parse_onlyr) rwarningscatch_warningssoup assertEqualdecodeZ document_forZ assertTruestrmessage)rZstrainermarkupwrr r r test_soupstrainersz*HTML5LibBuilderSmokeTest.test_soupstrainercCsd}|j|d|jddS)z8html5lib inserts tags where other parsers don't.z[z
Here's another table:
foo
Here's another table:
foo
z{
Foo
Bar
Baz
N)ZassertSoupEquals)rrr r r test_correctly_nested_tables(s z5HTML5LibBuilderSmokeTest.test_correctly_nested_tablescCs$d}|j|}|jd|jjdS)Nzy

foo

s

foo

)rrpencode)rrrr r r (test_xml_declaration_followed_by_doctype<s  zAHTML5LibBuilderSmokeTest.test_xml_declaration_followed_by_doctypecCs:d}|j|}|jd|jj|jdt|jddS)Nz%

foo

bar

zD

foo

bar

r)rrbodyrlenfind_all)rrrr r r test_reparented_markupJs z/HTML5LibBuilderSmokeTest.test_reparented_markupcCs:d}|j|}|jd|jj|jdt|jddS)Nz&

foo

bar

zE

foo

bar

rr)rrrrrr)rrrr r r +test_reparented_markup_ends_with_whitespaceQs zDHTML5LibBuilderSmokeTest.test_reparented_markup_ends_with_whitespacecCsLd}|j|}|jdd\}}|jd\}}|j|ks:t|j|ksHtdS)zVerify that we keep the two whitespace nodes in this document distinct when reparenting the adjacent tags. z,
 )stringZtbodyN)rr next_elementAssertionError)rrrZspace1Zspace2Ztbody1Ztbody2r r r aftermath

aftermath

target)r#Z aftermath)rnoscriptrr$findrZprevious_element)rrrr*r'Zfinal_aftermathr r r *test_reparented_markup_containing_childrenbs  zCHTML5LibBuilderSmokeTest.test_reparented_markup_containing_childrencCs$d}|j|}t|jds tdS)z(Processing instructions become comments.szN)rr startswithr%)rrrr r r test_processing_instructionrs z4HTML5LibBuilderSmokeTest.test_processing_instructioncCs8d}|j|}|jd\}}|j||||k s4tdS)Ns

a)rrrr%)rrrZa1Za2r r r test_cloned_multivalue_nodexs   z4HTML5LibBuilderSmokeTest.test_cloned_multivalue_nodecCs$d}|j|}|jd|jjdS)NsAz>A
)rrrr)rrrr r r test_foster_parentings z.HTML5LibBuilderSmokeTest.test_foster_parentingN)__name__ __module__ __qualname____doc__propertyr rrrr r!r&r,r.r0r1r r r r rs   r)r5rZ bs4.builderrZHTML5LIB_PRESENT ImportErroreZ bs4.elementrZ bs4.testingrrrrr r r r s  PK!]_@u 6tests/__pycache__/test_htmlparser.cpython-36.opt-1.pycnu[3 Y=K[@sfdZddlmZddlZddlmZmZddlmZddl m Z GdddeeZ Gd d d eZ dS) zGTests to ensure that the html.parser tree builder generates good trees.) set_traceN)SoupTestHTMLTreeBuilderSmokeTest)HTMLParserTreeBuilder)BeautifulSoupHTMLParserc@s@eZdZeddZddZddZddZd d Zd d Z d S)HTMLParserTreeBuilderSmokeTestcCstS)N)r)selfr %/usr/lib/python3.6/test_htmlparser.pydefault_builder sz.HTMLParserTreeBuilderSmokeTest.default_buildercCsdS)Nr )rr r r test_namespaced_system_doctypesz=HTMLParserTreeBuilderSmokeTest.test_namespaced_system_doctypecCsdS)Nr )rr r r test_namespaced_public_doctypesz=HTMLParserTreeBuilderSmokeTest.test_namespaced_public_doctypecCs<|jd}tj|d}tj|}|jt|jt|jdS)zfUnlike most tree builders, HTMLParserTreeBuilder and will be restored after pickling. z fooN)ZsouppickledumpsloadsZ assertTrue isinstanceZbuildertype)rZtreeZdumpedZloadedr r r test_builder_is_pickleds   z6HTMLParserTreeBuilderSmokeTest.test_builder_is_pickledcCs|jdd|jdddS)Nz





z


z


)assertSoupEquals)rr r r )test_redundant_empty_element_closing_tags!s zHHTMLParserTreeBuilderSmokeTest.test_redundant_empty_element_closing_tagscCs|jdddS)Nz foo &# barzfoo &# bar)r)rr r r test_empty_element%sz1HTMLParserTreeBuilderSmokeTest.test_empty_elementN) __name__ __module__ __qualname__propertyr r r rrrr r r r r s   rc@seZdZddZdS)TestHTMLParserSubclasscCst}|jddS)zlVerify that our HTMLParser subclass implements error() in a way that doesn't cause a crash. z don't crashN)rerror)rparserr r r test_error,sz!TestHTMLParserSubclass.test_errorN)rrrr r r r r r+sr) __doc__ZpdbrrZ bs4.testingrrZ bs4.builderrZbs4.builder._htmlparserrrrr r r r s   !PK!]_@u 0tests/__pycache__/test_htmlparser.cpython-36.pycnu[3 Y=K[@sfdZddlmZddlZddlmZmZddlmZddl m Z GdddeeZ Gd d d eZ dS) zGTests to ensure that the html.parser tree builder generates good trees.) set_traceN)SoupTestHTMLTreeBuilderSmokeTest)HTMLParserTreeBuilder)BeautifulSoupHTMLParserc@s@eZdZeddZddZddZddZd d Zd d Z d S)HTMLParserTreeBuilderSmokeTestcCstS)N)r)selfr %/usr/lib/python3.6/test_htmlparser.pydefault_builder sz.HTMLParserTreeBuilderSmokeTest.default_buildercCsdS)Nr )rr r r test_namespaced_system_doctypesz=HTMLParserTreeBuilderSmokeTest.test_namespaced_system_doctypecCsdS)Nr )rr r r test_namespaced_public_doctypesz=HTMLParserTreeBuilderSmokeTest.test_namespaced_public_doctypecCs<|jd}tj|d}tj|}|jt|jt|jdS)zfUnlike most tree builders, HTMLParserTreeBuilder and will be restored after pickling. z fooN)ZsouppickledumpsloadsZ assertTrue isinstanceZbuildertype)rZtreeZdumpedZloadedr r r test_builder_is_pickleds   z6HTMLParserTreeBuilderSmokeTest.test_builder_is_pickledcCs|jdd|jdddS)Nz





z


z


)assertSoupEquals)rr r r )test_redundant_empty_element_closing_tags!s zHHTMLParserTreeBuilderSmokeTest.test_redundant_empty_element_closing_tagscCs|jdddS)Nz foo &# barzfoo &# bar)r)rr r r test_empty_element%sz1HTMLParserTreeBuilderSmokeTest.test_empty_elementN) __name__ __module__ __qualname__propertyr r r rrrr r r r r s   rc@seZdZddZdS)TestHTMLParserSubclasscCst}|jddS)zlVerify that our HTMLParser subclass implements error() in a way that doesn't cause a crash. z don't crashN)rerror)rparserr r r test_error,sz!TestHTMLParserSubclass.test_errorN)rrrr r r r r r+sr) __doc__ZpdbrrZ bs4.testingrrZ bs4.builderrZbs4.builder._htmlparserrrrr r r r s   !PK!ٺ 0tests/__pycache__/test_lxml.cpython-36.opt-1.pycnu[3 6]U @sdZddlZddlZyddlZdZejjZWn*ek rVZ zdZdZWYddZ [ XnXerlddl m Z m Z ddl mZmZddlmZmZmZddlmZdd lmZdd lmZmZmZmZee d Gd d d eeZee dGdddeeZdS)z@Tests to ensure that the lxml tree builder generates good trees.NTF)LXMLTreeBuilderLXMLTreeBuilderForXML) BeautifulSoupBeautifulStoneSoup)CommentDoctype SoupStrainer)skipIf)test_htmlparser)HTMLTreeBuilderSmokeTestXMLTreeBuilderSmokeTestSoupTestr z;lxml seems not to be present, not testing its tree builder.c@sPeZdZdZeddZddZddZee p6e dkd d dZ ddZ dS)LXMLTreeBuilderSmokeTestz!See ``HTMLTreeBuilderSmokeTest``.cCstS)N)r)selfr/usr/lib/python3.6/test_lxml.pydefault_builder%sz(LXMLTreeBuilderSmokeTest.default_buildercCs(|jdd|jdd|jdddS)Nz

foo�bar

z

foobar

z

foo�bar

z

foo�bar

)ZassertSoupEquals)rrrrtest_out_of_range_entity)s z1LXMLTreeBuilderSmokeTest.test_out_of_range_entitycCsdS)Nr)rrrr*test_entities_in_foreign_document_encoding1szCLXMLTreeBuilderSmokeTest.test_entities_in_foreign_document_encodingrz@Skipping doctype test for old version of lxml to avoid segfault.cCs(|jd}|jd}|jd|jdS)Nz r)soupcontents assertEqualstrip)rrZdoctyperrrtest_empty_doctype:s  z+LXMLTreeBuilderSmokeTest.test_empty_doctypec CsNtjdd}td}WdQRX|jdt|j|jdt|djkdS)NT)recordzzz&BeautifulStoneSoup class is deprecatedr)warningscatch_warningsrrstrbZ assertTruemessage)rwrrrr%test_beautifulstonesoup_is_xml_parserBsz>LXMLTreeBuilderSmokeTest.test_beautifulstonesoup_is_xml_parserN)rrrr) __name__ __module__ __qualname____doc__propertyrrrr LXML_PRESENT LXML_VERSIONrr%rrrrrs   rz?lxml seems not to be present, not testing its XML tree builder.c@seZdZdZeddZdS)LXMLXMLTreeBuilderSmokeTestz!See ``HTMLTreeBuilderSmokeTest``.cCstS)N)r)rrrrrPsz+LXMLXMLTreeBuilderSmokeTest.default_builderN)r&r'r(r)r*rrrrrr-Jsr-)r)r)rerZ lxml.etreeZlxmlr+Zetreer, ImportErroreZ bs4.builderrrZbs4rrZ bs4.elementrrrZ bs4.testingr Z bs4.testsr r r r rr-rrrrs0    (PK!ٺ *tests/__pycache__/test_lxml.cpython-36.pycnu[3 6]U @sdZddlZddlZyddlZdZejjZWn*ek rVZ zdZdZWYddZ [ XnXerlddl m Z m Z ddl mZmZddlmZmZmZddlmZdd lmZdd lmZmZmZmZee d Gd d d eeZee dGdddeeZdS)z@Tests to ensure that the lxml tree builder generates good trees.NTF)LXMLTreeBuilderLXMLTreeBuilderForXML) BeautifulSoupBeautifulStoneSoup)CommentDoctype SoupStrainer)skipIf)test_htmlparser)HTMLTreeBuilderSmokeTestXMLTreeBuilderSmokeTestSoupTestr z;lxml seems not to be present, not testing its tree builder.c@sPeZdZdZeddZddZddZee p6e dkd d dZ ddZ dS)LXMLTreeBuilderSmokeTestz!See ``HTMLTreeBuilderSmokeTest``.cCstS)N)r)selfr/usr/lib/python3.6/test_lxml.pydefault_builder%sz(LXMLTreeBuilderSmokeTest.default_buildercCs(|jdd|jdd|jdddS)Nz

foo�bar

z

foobar

z

foo�bar

z

foo�bar

)ZassertSoupEquals)rrrrtest_out_of_range_entity)s z1LXMLTreeBuilderSmokeTest.test_out_of_range_entitycCsdS)Nr)rrrr*test_entities_in_foreign_document_encoding1szCLXMLTreeBuilderSmokeTest.test_entities_in_foreign_document_encodingrz@Skipping doctype test for old version of lxml to avoid segfault.cCs(|jd}|jd}|jd|jdS)Nz r)soupcontents assertEqualstrip)rrZdoctyperrrtest_empty_doctype:s  z+LXMLTreeBuilderSmokeTest.test_empty_doctypec CsNtjdd}td}WdQRX|jdt|j|jdt|djkdS)NT)recordzzz&BeautifulStoneSoup class is deprecatedr)warningscatch_warningsrrstrbZ assertTruemessage)rwrrrr%test_beautifulstonesoup_is_xml_parserBsz>LXMLTreeBuilderSmokeTest.test_beautifulstonesoup_is_xml_parserN)rrrr) __name__ __module__ __qualname____doc__propertyrrrr LXML_PRESENT LXML_VERSIONrr%rrrrrs   rz?lxml seems not to be present, not testing its XML tree builder.c@seZdZdZeddZdS)LXMLXMLTreeBuilderSmokeTestz!See ``HTMLTreeBuilderSmokeTest``.cCstS)N)r)rrrrrPsz+LXMLXMLTreeBuilderSmokeTest.default_builderN)r&r'r(r)r*rrrrrr-Jsr-)r)r)rerZ lxml.etreeZlxmlr+Zetreer, ImportErroreZ bs4.builderrrZbs4rrZ bs4.elementrrrZ bs4.testingr Z bs4.testsr r r r rr-rrrrs0    (PK!ʮs8U8U0tests/__pycache__/test_soup.cpython-36.opt-1.pycnu[3 *6]dO@s~dZddlmZddlZddlZddlZddlZddlmZm Z ddl m Z m Z m Z mZddlZddlmZmZmZddlmZmZddlZyddlmZmZd ZWn&ek rZz d ZWYddZ[XnXejdd koejdkZGd ddeZGdddeZ GdddeZ GdddeZ!Gdddej"Z#GdddeZ$Gdddej"Z%GdddeZ&Gdddej"Z'dS)z#Tests of Beautiful Soup as a whole.) set_traceN) BeautifulSoupBeautifulStoneSoup)CharsetMetaAttributeValueContentMetaAttributeValue SoupStrainerNamespacedAttribute)EntitySubstitution UnicodeDammitEncodingDetector)SoupTestskipIf)LXMLTreeBuilderLXMLTreeBuilderForXMLTFc@s$eZdZddZddZddZdS)TestConstructorcCs"d}|j|}|jd|jjdS)Nu

éé

uéé)soup assertEqualh1string)selfdatarr/usr/lib/python3.6/test_soup.pytest_short_unicode_input*s z(TestConstructor.test_short_unicode_inputcCs"d}|j|}|jd|jjdS)Nz

foobar

zfoobar)rrrr)rrrrrrtest_embedded_null/s z"TestConstructor.test_embedded_nullcCs,djd}|j|dgd}|jd|jdS)Nu Räksmörgåszutf-8)exclude_encodingsz windows-1252)encoderroriginal_encoding)r utf8_datarrrrtest_exclude_encodings4s z&TestConstructor.test_exclude_encodingsN)__name__ __module__ __qualname__rrr!rrrrr(src@sFeZdZdddZddZddZdd Zd d Zd d ZddZ dS) TestWarningsTcCs"|jtjdd}|j|dS)NP) startswithrZNO_PARSER_SPECIFIED_WARNING assertTrue)rsZis_therevrrr_no_parser_specified<sz!TestWarnings._no_parser_specifiedc Cs>tjdd}|jd}WdQRXt|dj}|j|dS)NT)recordzr)warningscatch_warningsrstrmessage_assert_no_parser_specified)rwrmsgrrr#test_warning_if_no_parser_specified@sz0TestWarnings.test_warning_if_no_parser_specifiedc Cs@tjdd}|jdd}WdQRXt|dj}|j|dS)NT)r,zhtmlr)r-r.rr/r0r1)rr2rr3rrr*test_warning_if_parser_specified_too_vagueFsz7TestWarnings.test_warning_if_parser_specified_too_vaguec Cs4tjdd}|jdd}WdQRX|jg|dS)NT)r,zz html.parser)r-r.rr)rr2rrrr,test_no_warning_if_explicit_parser_specifiedLsz9TestWarnings.test_no_warning_if_explicit_parser_specifiedc Cshtjdd}|jdtdd}WdQRXt|dj}|jd|k|jd|k|jd |jdS) NT)r,zb)parseOnlyTheserr9 parse_onlys) r-r.rrr/r0r(rr)rr2rr3rrr)test_parseOnlyThese_renamed_to_parse_onlyQs z6TestWarnings.test_parseOnlyThese_renamed_to_parse_onlyc Csftjdd}d}|j|dd}WdQRXt|dj}|jd|k|jd|k|jd|jdS) NT)r,séutf8) fromEncodingrr=Z from_encoding)r-r.rr/r0r(rr)rr2r<rr3rrr*test_fromEncoding_renamed_to_from_encodingYsz7TestWarnings.test_fromEncoding_renamed_to_from_encodingcCs|jt|jddddS)NzT)Zno_such_argument) assertRaises TypeErrorr)rrrr"test_unrecognized_keyword_argumentbsz/TestWarnings.test_unrecognized_keyword_argumentN)T) r"r#r$r+r4r6r7r;r>rArrrrr%:s  r%c@s4eZdZddZddZddZddZd d Zd S) r%cCstj}|j}zBtjdd}|j|}WdQRXt|dj}|jd|kWd|j Xtjdd}|j|}WdQRX|j dt |dS)NT)r,rzlooks like a filename) tempfileZNamedTemporaryFilenamer-r.rr/r0r(closerlen)rZ filehandlefilenamer2rr3rrrtest_disk_file_warninghs z#TestWarnings.test_disk_file_warningc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,shttp://www.crummybytes.com/css|]}dt|jkVqdS)zlooks like a URLN)r/r0).0r2rrr }sz?TestWarnings.test_url_warning_with_bytes_url..)r-r.rr(any)r warning_listrrrrtest_url_warning_with_bytes_urlxs z,TestWarnings.test_url_warning_with_bytes_urlc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,zhttp://www.crummyunicode.com/css|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszATestWarnings.test_url_warning_with_unicode_url..)r-r.rr(rJ)rrKrrrr!test_url_warning_with_unicode_urls z.TestWarnings.test_url_warning_with_unicode_urlc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,s$http://www.crummybytes.com/ is greatcss|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszETestWarnings.test_url_warning_with_bytes_and_space..)r-r.r assertFalserJ)rrKrrrr%test_url_warning_with_bytes_and_spaces z2TestWarnings.test_url_warning_with_bytes_and_spacec Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,z%http://www.crummyuncode.com/ is greatcss|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszGTestWarnings.test_url_warning_with_unicode_and_space..)r-r.rrNrJ)rrKrrrr'test_url_warning_with_unicode_and_spaces z4TestWarnings.test_url_warning_with_unicode_and_spaceN)r"r#r$rGrLrMrOrPrrrrr%fs c@seZdZddZdS)TestSelectiveParsingcCs.d}td}|j||d}|j|jddS)Nz&NoYesNoYes Yesr8)r:sYesYes Yes)rrrr)rmarkupZstrainerrrrrtest_parse_with_soupstrainersz1TestSelectiveParsing.test_parse_with_soupstrainerN)r"r#r$rSrrrrrQsrQc@sxeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZdS)TestEntitySubstitutionz1Standalone tests of the EntitySubstitution class.cCs t|_dS)N)r sub)rrrrsetUpszTestEntitySubstitution.setUpcCsd}|j|jj|ddS)Nufoo\u2200☃\u00f5barufoo∀☃õbar)rrUsubstitute_html)rr)rrrtest_simple_html_substitutionsz4TestEntitySubstitution.test_simple_html_substitutioncCs&d}t|}|j|jj|jddS)Nsfooz‘’foo“”)r rrUrWrR)rZquotesdammitrrrtest_smart_quote_substitutionsz4TestEntitySubstitution.test_smart_quote_substitutioncCsd}|j|jj|d|dS)NzWelcome to "my bar"F)rrUsubstitute_xml)rr)rrrItest_xml_converstion_includes_no_quotes_if_make_quoted_attribute_is_falsesz`TestEntitySubstitution.test_xml_converstion_includes_no_quotes_if_make_quoted_attribute_is_falsecCs0|j|jjddd|j|jjddddS)NZWelcomeTz "Welcome"z Bob's Barz "Bob's Bar")rrUr[)rrrr6test_xml_attribute_quoting_normally_uses_double_quotesszMTestEntitySubstitution.test_xml_attribute_quoting_normally_uses_double_quotescCsd}|j|jj|dddS)NzWelcome to "my bar"Tz'Welcome to "my bar"')rrUr[)rr)rrrOtest_xml_attribute_quoting_uses_single_quotes_when_value_contains_double_quotesszfTestEntitySubstitution.test_xml_attribute_quoting_uses_single_quotes_when_value_contains_double_quotescCsd}|j|jj|dddS)NzWelcome to "Bob's Bar"Tz""Welcome to "Bob's Bar"")rrUr[)rr)rrrbtest_xml_attribute_quoting_escapes_single_quotes_when_value_contains_both_single_and_double_quotess zyTestEntitySubstitution.test_xml_attribute_quoting_escapes_single_quotes_when_value_contains_both_single_and_double_quotescCsd}|j|jj||dS)NzWelcome to "Bob's Bar")rrUr[)rZquotedrrrzfoo<bar>)rrUr[)rrrr'test_xml_quoting_handles_angle_bracketss z>TestEntitySubstitution.test_xml_quoting_handles_angle_bracketscCs|j|jjdddS)NzAT&TzAT&T)rrUr[)rrrr#test_xml_quoting_handles_ampersandssz:TestEntitySubstitution.test_xml_quoting_handles_ampersandscCs|j|jjdddS)Nz ÁT&Tz&Aacute;T&T)rrUr[)rrrrEtest_xml_quoting_including_ampersands_when_they_are_part_of_an_entitys z\TestEntitySubstitution.test_xml_quoting_including_ampersands_when_they_are_part_of_an_entitycCs|j|jjdddS)Nz ÁT&TzÁT&T)rrUZ"substitute_xml_containing_entities)rrrrDtest_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entitys z[TestEntitySubstitution.test_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entitycCsd}|j|jj||dS)z:There's no need to do this except inside attribute values.z Bob's "bar"N)rrUrW)rtextrrr test_quotes_not_html_substitutedsz7TestEntitySubstitution.test_quotes_not_html_substitutedN)r"r#r$__doc__rVrXrZr\r]r^r_r`rarbrcrdrfrrrrrTsrTcsNeZdZfddZddZddZddZd d Zee d d d Z Z S)TestEncodingConversioncs4tt|jd|_|jjd|_|j|jddS)NuUSacré bleu!zutf-8sUSacré bleu!)superrhrV unicode_datarr r)r) __class__rrrVs zTestEncodingConversion.setUpc Cstjj}tjtjzbdd}|tj_d}|j|}|j}|jt |t |j ||j |j|j |j jdWdtjtj|tj_XdS)NcSsdS)Nr)r/rrrnoopsz>TestEncodingConversion.test_ascii_in_unicode_out..noops azutf-8)bs4rYchardet_dammitloggingdisableWARNINGrdecoder( isinstancer/rZ document_forrlowerNOTSET)rchardetrlasciiZsoup_from_asciiZunicode_outputrrrtest_ascii_in_unicode_outs   z0TestEncodingConversion.test_ascii_in_unicode_outcCs@|j|j}|j|j|j|j|jjd|j|jddS)Nu Sacré bleu!)rrjrrrfoorr)rsoup_from_unicoderrrtest_unicode_in_unicode_outs z2TestEncodingConversion.test_unicode_in_unicode_outcCs2|j|j}|j|j|j|j|jjddS)Nu Sacré bleu!)rr rrrrjryr)rZsoup_from_utf8rrrtest_utf8_in_unicode_out s z/TestEncodingConversion.test_utf8_in_unicode_outcCs$|j|j}|j|jd|jdS)Nzutf-8)rrjrrr )rrzrrr test_utf8_outs z$TestEncodingConversion.test_utf8_outzQBad HTMLParser detected; skipping test of non-ASCII characters in attribute name.cCs(d}|j|j|jjd|jddS)Nu
r<)rrZdivr)rrRrrr1test_attribute_name_containing_unicode_charactersszHTestEncodingConversion.test_attribute_name_containing_unicode_characters) r"r#r$rVrxr{r|r}r PYTHON_3_PRE_3_2r~ __classcell__rr)rkrrhs rhc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZddZddZd d!Zd"d#Zd$S)%TestUnicodeDammitz"Standalone tests of UnicodeDammit.cCsd}t|}|j|j|dS)NuI'm already Unicode! ☃)r runicode_markup)rrRrYrrrtest_unicode_input"sz$TestUnicodeDammit.test_unicode_inputcCsd}t|}|j|jddS)Nsz#\u2018\u2019\u201c\u201d)r rr)rrRrYrrrtest_smart_quotes_to_unicode'sz.TestUnicodeDammit.test_smart_quotes_to_unicodecCs"d}t|dd}|j|jddS)NsZxml)smart_quotes_toz+‘’“”)r rr)rrRrYrrr!test_smart_quotes_to_xml_entities-s z3TestUnicodeDammit.test_smart_quotes_to_xml_entitiescCs"d}t|dd}|j|jddS)Nsr5)rz'‘’“”)r rr)rrRrYrrr"test_smart_quotes_to_html_entities3s z4TestUnicodeDammit.test_smart_quotes_to_html_entitiescCs"d}t|dd}|j|jddS)Nsrw)rz''"")r rr)rrRrYrrrtest_smart_quotes_to_ascii9s z,TestUnicodeDammit.test_smart_quotes_to_asciicCs0d}t|}|j|jjd|j|jddS)NsSacré bleu! ☃zutf-8uSacré bleu! ☃)r rrrtr)rr<rYrrrtest_detect_utf8?sz"TestUnicodeDammit.test_detect_utf8cCs4d}t|dg}|j|jjd|j|jddS)Nsz iso-8859-8z\u05dd\u05d5\u05dc\u05e9)r rrrtr)rhebrewrYrrrtest_convert_hebrewFs z%TestUnicodeDammit.test_convert_hebrewcCs6d}t|}|j|jjd|j|jjd|dS)Nsケータイ Watchzutf-8)r rrrtrr)rutf_8rYrrr/test_dont_see_smart_quotes_where_there_are_noneLszATestUnicodeDammit.test_dont_see_smart_quotes_where_there_are_nonecCs,djd}t|dg}|j|jjddS)Nu Räksmörgåszutf-8z iso-8859-8)rr rrrt)rr rYrrr test_ignore_inappropriate_codecsRs  z2TestUnicodeDammit.test_ignore_inappropriate_codecscCs:djd}x*dD]"}t||g}|j|jjdqWdS)Nu Räksmörgåszutf-8.utf8... utF---16.!)rrr)rr rrrt)rr Z bad_encodingrYrrrtest_ignore_invalid_codecsWs   z,TestUnicodeDammit.test_ignore_invalid_codecscCsLdjd}t|dgd}|j|jjdt|ddgd}|j|jddS)Nu Räksmörgåszutf-8)rz windows-1252)rr rrrt)rr rYrrrr!]s  z(TestUnicodeDammit.test_exclude_encodingscCstd}t|j}dS)Ns')r list encodings)rZdetectedrrrrPtest_encoding_detector_replaces_junk_in_encoding_name_with_replacement_characterks zbTestUnicodeDammit.test_encoding_detector_replaces_junk_in_encoding_name_with_replacement_charactercCs,x&dD]}t|dd}|jd|jqWdS) N&&$#T)Zis_htmlzeuc-jp)rrrr)r rr)rrrYrrr test_detect_html5_style_meta_tagqs z2TestUnicodeDammit.test_detect_html5_style_meta_tagc Csd}tjj}tjtjzPdd}|tj_t|}|jd|j|j d|j kt |d}|j |jWdtjtj |tj_XdS)NsT بتر ѐcSsdS)Nr)r/rrrrlszBTestUnicodeDammit.test_last_ditch_entity_replacement..noopTz\ufffdz html.parser) rmrYrnrorprqr rZcontains_replacement_charactersr(rrru)rdocrvrlrYrrrr"test_last_ditch_entity_replacement|s   z4TestUnicodeDammit.test_last_ditch_entity_replacementcCs,d}t|}|jd|j|jd|jdS)Ns<a></a>u áézutf-16le)r rrr)rrrYrrrtest_byte_order_mark_removedsz.TestUnicodeDammit.test_byte_order_mark_removedcCsPdjd}djd}|||}|jt|jdtj|}|jd|jddS)Nu☃rr<u“Hi, I like Windows!” windows_1252u+☃☃☃“Hi, I like Windows!”☃☃☃u ☃☃☃)rr?UnicodeDecodeErrorrrr detwingler)rr<rrZfixedrrrtest_detwingles   z TestUnicodeDammit.test_detwinglecCsBxs6 ,/ E</PK!>`U`U*tests/__pycache__/test_soup.cpython-36.pycnu[3 *6]dO@s~dZddlmZddlZddlZddlZddlZddlmZm Z ddl m Z m Z m Z mZddlZddlmZmZmZddlmZmZddlZyddlmZmZd ZWn&ek rZz d ZWYddZ[XnXejdd koejdkZGd ddeZGdddeZ GdddeZ GdddeZ!Gdddej"Z#GdddeZ$Gdddej"Z%GdddeZ&Gdddej"Z'dS)z#Tests of Beautiful Soup as a whole.) set_traceN) BeautifulSoupBeautifulStoneSoup)CharsetMetaAttributeValueContentMetaAttributeValue SoupStrainerNamespacedAttribute)EntitySubstitution UnicodeDammitEncodingDetector)SoupTestskipIf)LXMLTreeBuilderLXMLTreeBuilderForXMLTFc@s$eZdZddZddZddZdS)TestConstructorcCs"d}|j|}|jd|jjdS)Nu

éé

uéé)soup assertEqualh1string)selfdatarr/usr/lib/python3.6/test_soup.pytest_short_unicode_input*s z(TestConstructor.test_short_unicode_inputcCs"d}|j|}|jd|jjdS)Nz

foobar

zfoobar)rrrr)rrrrrrtest_embedded_null/s z"TestConstructor.test_embedded_nullcCs,djd}|j|dgd}|jd|jdS)Nu Räksmörgåszutf-8)exclude_encodingsz windows-1252)encoderroriginal_encoding)r utf8_datarrrrtest_exclude_encodings4s z&TestConstructor.test_exclude_encodingsN)__name__ __module__ __qualname__rrr!rrrrr(src@sFeZdZdddZddZddZdd Zd d Zd d ZddZ dS) TestWarningsTcCs"|jtjdd}|j|dS)NP) startswithrZNO_PARSER_SPECIFIED_WARNING assertTrue)rsZis_therevrrr_no_parser_specified<sz!TestWarnings._no_parser_specifiedc Cs>tjdd}|jd}WdQRXt|dj}|j|dS)NT)recordzr)warningscatch_warningsrstrmessage_assert_no_parser_specified)rwrmsgrrr#test_warning_if_no_parser_specified@sz0TestWarnings.test_warning_if_no_parser_specifiedc Cs@tjdd}|jdd}WdQRXt|dj}|j|dS)NT)r,zhtmlr)r-r.rr/r0r1)rr2rr3rrr*test_warning_if_parser_specified_too_vagueFsz7TestWarnings.test_warning_if_parser_specified_too_vaguec Cs4tjdd}|jdd}WdQRX|jg|dS)NT)r,zz html.parser)r-r.rr)rr2rrrr,test_no_warning_if_explicit_parser_specifiedLsz9TestWarnings.test_no_warning_if_explicit_parser_specifiedc Cshtjdd}|jdtdd}WdQRXt|dj}|jd|k|jd|k|jd |jdS) NT)r,zb)parseOnlyTheserr9 parse_onlys) r-r.rrr/r0r(rr)rr2rr3rrr)test_parseOnlyThese_renamed_to_parse_onlyQs z6TestWarnings.test_parseOnlyThese_renamed_to_parse_onlyc Csftjdd}d}|j|dd}WdQRXt|dj}|jd|k|jd|k|jd|jdS) NT)r,séutf8) fromEncodingrr=Z from_encoding)r-r.rr/r0r(rr)rr2r<rr3rrr*test_fromEncoding_renamed_to_from_encodingYsz7TestWarnings.test_fromEncoding_renamed_to_from_encodingcCs|jt|jddddS)NzT)Zno_such_argument) assertRaises TypeErrorr)rrrr"test_unrecognized_keyword_argumentbsz/TestWarnings.test_unrecognized_keyword_argumentN)T) r"r#r$r+r4r6r7r;r>rArrrrr%:s  r%c@s4eZdZddZddZddZddZd d Zd S) r%cCstj}|j}zBtjdd}|j|}WdQRXt|dj}|jd|kWd|j Xtjdd}|j|}WdQRX|j dt |dS)NT)r,rzlooks like a filename) tempfileZNamedTemporaryFilenamer-r.rr/r0r(closerlen)rZ filehandlefilenamer2rr3rrrtest_disk_file_warninghs z#TestWarnings.test_disk_file_warningc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,shttp://www.crummybytes.com/css|]}dt|jkVqdS)zlooks like a URLN)r/r0).0r2rrr }sz?TestWarnings.test_url_warning_with_bytes_url..)r-r.rr(any)r warning_listrrrrtest_url_warning_with_bytes_urlxs z,TestWarnings.test_url_warning_with_bytes_urlc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,zhttp://www.crummyunicode.com/css|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszATestWarnings.test_url_warning_with_unicode_url..)r-r.rr(rJ)rrKrrrr!test_url_warning_with_unicode_urls z.TestWarnings.test_url_warning_with_unicode_urlc Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,s$http://www.crummybytes.com/ is greatcss|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszETestWarnings.test_url_warning_with_bytes_and_space..)r-r.r assertFalserJ)rrKrrrr%test_url_warning_with_bytes_and_spaces z2TestWarnings.test_url_warning_with_bytes_and_spacec Cs>tjdd}|jd}WdQRX|jtdd|DdS)NT)r,z%http://www.crummyuncode.com/ is greatcss|]}dt|jkVqdS)zlooks like a URLN)r/r0)rHr2rrrrIszGTestWarnings.test_url_warning_with_unicode_and_space..)r-r.rrNrJ)rrKrrrr'test_url_warning_with_unicode_and_spaces z4TestWarnings.test_url_warning_with_unicode_and_spaceN)r"r#r$rGrLrMrOrPrrrrr%fs c@seZdZddZdS)TestSelectiveParsingcCs.d}td}|j||d}|j|jddS)Nz&NoYesNoYes Yesr8)r:sYesYes Yes)rrrr)rmarkupZstrainerrrrrtest_parse_with_soupstrainersz1TestSelectiveParsing.test_parse_with_soupstrainerN)r"r#r$rSrrrrrQsrQc@sxeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZdS)TestEntitySubstitutionz1Standalone tests of the EntitySubstitution class.cCs t|_dS)N)r sub)rrrrsetUpszTestEntitySubstitution.setUpcCsd}|j|jj|ddS)Nufoo\u2200☃\u00f5barufoo∀☃õbar)rrUsubstitute_html)rr)rrrtest_simple_html_substitutionsz4TestEntitySubstitution.test_simple_html_substitutioncCs&d}t|}|j|jj|jddS)Nsfooz‘’foo“”)r rrUrWrR)rZquotesdammitrrrtest_smart_quote_substitutionsz4TestEntitySubstitution.test_smart_quote_substitutioncCsd}|j|jj|d|dS)NzWelcome to "my bar"F)rrUsubstitute_xml)rr)rrrItest_xml_converstion_includes_no_quotes_if_make_quoted_attribute_is_falsesz`TestEntitySubstitution.test_xml_converstion_includes_no_quotes_if_make_quoted_attribute_is_falsecCs0|j|jjddd|j|jjddddS)NZWelcomeTz "Welcome"z Bob's Barz "Bob's Bar")rrUr[)rrrr6test_xml_attribute_quoting_normally_uses_double_quotesszMTestEntitySubstitution.test_xml_attribute_quoting_normally_uses_double_quotescCsd}|j|jj|dddS)NzWelcome to "my bar"Tz'Welcome to "my bar"')rrUr[)rr)rrrOtest_xml_attribute_quoting_uses_single_quotes_when_value_contains_double_quotesszfTestEntitySubstitution.test_xml_attribute_quoting_uses_single_quotes_when_value_contains_double_quotescCsd}|j|jj|dddS)NzWelcome to "Bob's Bar"Tz""Welcome to "Bob's Bar"")rrUr[)rr)rrrbtest_xml_attribute_quoting_escapes_single_quotes_when_value_contains_both_single_and_double_quotess zyTestEntitySubstitution.test_xml_attribute_quoting_escapes_single_quotes_when_value_contains_both_single_and_double_quotescCsd}|j|jj||dS)NzWelcome to "Bob's Bar")rrUr[)rZquotedrrrzfoo<bar>)rrUr[)rrrr'test_xml_quoting_handles_angle_bracketss z>TestEntitySubstitution.test_xml_quoting_handles_angle_bracketscCs|j|jjdddS)NzAT&TzAT&T)rrUr[)rrrr#test_xml_quoting_handles_ampersandssz:TestEntitySubstitution.test_xml_quoting_handles_ampersandscCs|j|jjdddS)Nz ÁT&Tz&Aacute;T&T)rrUr[)rrrrEtest_xml_quoting_including_ampersands_when_they_are_part_of_an_entitys z\TestEntitySubstitution.test_xml_quoting_including_ampersands_when_they_are_part_of_an_entitycCs|j|jjdddS)Nz ÁT&TzÁT&T)rrUZ"substitute_xml_containing_entities)rrrrDtest_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entitys z[TestEntitySubstitution.test_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entitycCsd}|j|jj||dS)z:There's no need to do this except inside attribute values.z Bob's "bar"N)rrUrW)rtextrrr test_quotes_not_html_substitutedsz7TestEntitySubstitution.test_quotes_not_html_substitutedN)r"r#r$__doc__rVrXrZr\r]r^r_r`rarbrcrdrfrrrrrTsrTcsNeZdZfddZddZddZddZd d Zee d d d Z Z S)TestEncodingConversioncs4tt|jd|_|jjd|_|j|jddS)NuUSacré bleu!zutf-8sUSacré bleu!)superrhrV unicode_datarr r)r) __class__rrrVs zTestEncodingConversion.setUpc Cstjj}tjtjzbdd}|tj_d}|j|}|j}|jt |t |j ||j |j|j |j jdWdtjtj|tj_XdS)NcSsdS)Nr)r/rrrnoopsz>TestEncodingConversion.test_ascii_in_unicode_out..noops azutf-8)bs4rYchardet_dammitloggingdisableWARNINGrdecoder( isinstancer/rZ document_forrlowerNOTSET)rchardetrlasciiZsoup_from_asciiZunicode_outputrrrtest_ascii_in_unicode_outs   z0TestEncodingConversion.test_ascii_in_unicode_outcCs@|j|j}|j|j|j|j|jjd|j|jddS)Nu Sacré bleu!)rrjrrrfoorr)rsoup_from_unicoderrrtest_unicode_in_unicode_outs z2TestEncodingConversion.test_unicode_in_unicode_outcCs2|j|j}|j|j|j|j|jjddS)Nu Sacré bleu!)rr rrrrjryr)rZsoup_from_utf8rrrtest_utf8_in_unicode_out s z/TestEncodingConversion.test_utf8_in_unicode_outcCs$|j|j}|j|jd|jdS)Nzutf-8)rrjrrr )rrzrrr test_utf8_outs z$TestEncodingConversion.test_utf8_outzQBad HTMLParser detected; skipping test of non-ASCII characters in attribute name.cCs(d}|j|j|jjd|jddS)Nu
r<)rrZdivr)rrRrrr1test_attribute_name_containing_unicode_charactersszHTestEncodingConversion.test_attribute_name_containing_unicode_characters) r"r#r$rVrxr{r|r}r PYTHON_3_PRE_3_2r~ __classcell__rr)rkrrhs rhc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZddZddZd d!Zd"d#Zd$S)%TestUnicodeDammitz"Standalone tests of UnicodeDammit.cCsd}t|}|j|j|dS)NuI'm already Unicode! ☃)r runicode_markup)rrRrYrrrtest_unicode_input"sz$TestUnicodeDammit.test_unicode_inputcCsd}t|}|j|jddS)Nsz#\u2018\u2019\u201c\u201d)r rr)rrRrYrrrtest_smart_quotes_to_unicode'sz.TestUnicodeDammit.test_smart_quotes_to_unicodecCs"d}t|dd}|j|jddS)NsZxml)smart_quotes_toz+‘’“”)r rr)rrRrYrrr!test_smart_quotes_to_xml_entities-s z3TestUnicodeDammit.test_smart_quotes_to_xml_entitiescCs"d}t|dd}|j|jddS)Nsr5)rz'‘’“”)r rr)rrRrYrrr"test_smart_quotes_to_html_entities3s z4TestUnicodeDammit.test_smart_quotes_to_html_entitiescCs"d}t|dd}|j|jddS)Nsrw)rz''"")r rr)rrRrYrrrtest_smart_quotes_to_ascii9s z,TestUnicodeDammit.test_smart_quotes_to_asciicCs0d}t|}|j|jjd|j|jddS)NsSacré bleu! ☃zutf-8uSacré bleu! ☃)r rrrtr)rr<rYrrrtest_detect_utf8?sz"TestUnicodeDammit.test_detect_utf8cCs4d}t|dg}|j|jjd|j|jddS)Nsz iso-8859-8z\u05dd\u05d5\u05dc\u05e9)r rrrtr)rhebrewrYrrrtest_convert_hebrewFs z%TestUnicodeDammit.test_convert_hebrewcCs6d}t|}|j|jjd|j|jjd|dS)Nsケータイ Watchzutf-8)r rrrtrr)rutf_8rYrrr/test_dont_see_smart_quotes_where_there_are_noneLszATestUnicodeDammit.test_dont_see_smart_quotes_where_there_are_nonecCs,djd}t|dg}|j|jjddS)Nu Räksmörgåszutf-8z iso-8859-8)rr rrrt)rr rYrrr test_ignore_inappropriate_codecsRs  z2TestUnicodeDammit.test_ignore_inappropriate_codecscCs:djd}x*dD]"}t||g}|j|jjdqWdS)Nu Räksmörgåszutf-8.utf8... utF---16.!)rrr)rr rrrt)rr Z bad_encodingrYrrrtest_ignore_invalid_codecsWs   z,TestUnicodeDammit.test_ignore_invalid_codecscCsLdjd}t|dgd}|j|jjdt|ddgd}|j|jddS)Nu Räksmörgåszutf-8)rz windows-1252)rr rrrt)rr rYrrrr!]s  z(TestUnicodeDammit.test_exclude_encodingscCs"td}t|j}d|kstdS)Ns'uutf-�)r list encodingsAssertionError)rZdetectedrrrrPtest_encoding_detector_replaces_junk_in_encoding_name_with_replacement_characterks zbTestUnicodeDammit.test_encoding_detector_replaces_junk_in_encoding_name_with_replacement_charactercCs,x&dD]}t|dd}|jd|jqWdS) N&&$#T)Zis_htmlzeuc-jp)rrrr)r rr)rrrYrrr test_detect_html5_style_meta_tagqs z2TestUnicodeDammit.test_detect_html5_style_meta_tagc Csd}tjj}tjtjzPdd}|tj_t|}|jd|j|j d|j kt |d}|j |jWdtjtj |tj_XdS)NsT بتر ѐcSsdS)Nr)r/rrrrlszBTestUnicodeDammit.test_last_ditch_entity_replacement..noopTz\ufffdz html.parser) rmrYrnrorprqr rZcontains_replacement_charactersr(rrru)rdocrvrlrYrrrr"test_last_ditch_entity_replacement|s   z4TestUnicodeDammit.test_last_ditch_entity_replacementcCs,d}t|}|jd|j|jd|jdS)Ns<a></a>u áézutf-16le)r rrr)rrrYrrrtest_byte_order_mark_removedsz.TestUnicodeDammit.test_byte_order_mark_removedcCsPdjd}djd}|||}|jt|jdtj|}|jd|jddS)Nu☃rr<u“Hi, I like Windows!” windows_1252u+☃☃☃“Hi, I like Windows!”☃☃☃u ☃☃☃)rr?UnicodeDecodeErrorrrr detwingler)rr<rrZfixedrrrtest_detwingles   z TestUnicodeDammit.test_detwinglecCsBxs6 ,/ E</PK!)pWW0tests/__pycache__/test_tree.cpython-36.opt-1.pycnu[3 6]8@sdZddlmZddlZddlZddlZddlZddlmZddl m Z m Z ddl m Z mZmZmZmZmZmZmZddlmZmZe jddk Ze jd dk ZGd d d eZGd d d eZGdddeZGdddeZGdddeZGdddeZ GdddeZ!GdddeZ"GdddeZ#Gddde#Z$Gddde#Z%Gd d!d!eZ&Gd"d#d#e&Z'Gd$d%d%e&Z(Gd&d'd'eZ)Gd(d)d)eZ*Gd*d+d+eZ+Gd,d-d-eZ,Gd.d/d/eZ-Gd0d1d1eZ.Gd2d3d3eZ/Gd4d5d5eZ0Gd6d7d7eZ1dS)8a8Tests for Beautiful Soup's tree traversal methods. The tree traversal methods are the main advantage of using Beautiful Soup over just using a parser. Different parsers will build different Beautiful Soup trees given the same markup, but all Beautiful Soup trees can be traversed with the methods tested here. ) set_traceN) BeautifulSoup)builder_registryHTMLParserTreeBuilder)PY3KCDataComment DeclarationDoctypeNavigableString SoupStrainerTag)SoupTestskipIfZxmlZlxmlc@seZdZddZddZdS)TreeTestcCs|jdd|D|dS)zMake sure that the given tags have the correct text. This is used in tests that define a bunch of tags, each containing a single string, and then select certain strings by some mechanism. cSsg|] }|jqS)string).0tagrr/usr/lib/python3.6/test_tree.py 2sz*TreeTest.assertSelects..N) assertEqual)selftags should_matchrrr assertSelects+szTreeTest.assertSelectscCs|jdd|D|dS)zMake sure that the given tags have the correct IDs. This is used in tests that define a bunch of tags, each containing a single string, and then select certain strings by some mechanism. cSsg|] }|dqS)idr)rrrrrr;sz-TreeTest.assertSelectsIDs..N)r)rrrrrrassertSelectsIDs4szTreeTest.assertSelectsIDsN)__name__ __module__ __qualname__rrrrrrr)s rc@s8eZdZdZddZddZddZdd Zd d Zd S) TestFindzBasic tests of the find() method. find() just calls find_all() with limit=1, so it's not tested all that thouroughly here. cCs"|jd}|j|jdjddS)Nz 1234b2)souprfindr)rr$rrr test_find_tagEs zTestFind.test_find_tagcCs"|jd}|j|jddddS)Nu

Räksmörgås

u Räksmörgås)r)r$rr%)rr$rrrtest_unicode_text_findIs zTestFind.test_unicode_text_findcCs,|jd}t||jd|jddjdS)Nu&

here it is

z here it isu Räksmörgås)r)r$strrr%text)rr$rrrtest_unicode_attribute_findMs z$TestFind.test_unicode_attribute_findcCs"|jd}|jdt|jdS)z)Test an optimization that finds all tags.zfoobarN)r$rlenfind_all)rr$rrrtest_find_everythingSs zTestFind.test_find_everythingcCs$|jd}|jdt|jddS)z;Test an optimization that finds all tags with a given name.zfoobarbazr+aN)r$rr,r-)rr$rrrtest_find_everything_with_nameXs z'TestFind.test_find_everything_with_nameN) rrr __doc__r&r'r*r.r0rrrrr!>s r!c@s8eZdZdZddZddZddZdd Zd d Zd S) TestFindAllz%Basic tests of the find_all() method.cCs|jd}|j|jdddg|j|jdddg|j|jddgdddg|j|jtjdddddg|j|jdddddgd S) z'You can search the tree for text nodes.uFoobar»bar)r)r)Fooz.*»TN)r$rr-recompile)rr$rrrtest_find_all_text_nodes`s  z$TestFindAll.test_find_all_text_nodescCs|jd}|j|jddddddg|j|jddddg|j|jdd ddddd d g|j|jdd ddddd d gd S)z7You can limit the number of items returned by find_all.z(12345r/)limit1r#3 45rN)r$rr-)rr$rrrtest_find_all_limitps zTestFindAll.test_find_all_limitcCs:|jd}|j|ddddg|j|jdddgdS) Nz!123r/r=)r:r;foo)rr<)r$rr")rr$rrr%test_calling_a_tag_is_calling_findall|s z1TestFindAll.test_calling_a_tag_is_calling_findallcCs.|jd}g}|j||jg|j|dS)Nz)r$appendrr-)rr$lrrrTtest_find_all_with_self_referential_data_structure_does_not_cause_infinite_recursions  z`TestFindAll.test_find_all_with_self_referential_data_structure_does_not_cause_infinite_recursioncCs^|jd}|jd}|jt|d|jd}|jt|d|jdd}|jt|ddS)z%All find_all calls return a ResultSetzr/sourceTrB)r)N)r$r- assertTruehasattr)rr$resultrrrtest_find_all_resultsets    z#TestFindAll.test_find_all_resultsetN) rrr r1r8rArCrFrKrrrrr2]s   r2c@seZdZddZdS)TestFindAllBasicNamespacescCs<|jd}|jd|jdj|jd|jddidjdS)Nz04r?z mathml:msqrtr/zsvg:fillZred)attrs)r$rr%rname)rr$rrrtest_find_by_namespaced_names z7TestFindAllBasicNamespaces.test_find_by_namespaced_nameN)rrr rOrrrrrLsrLcspeZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZZS)TestFindAllByNamez&Test ways of finding tags by tag name.cstt|j|jd|_dS)NzFirst tag. Second tag. Third Nested tag. tag.)superrsetUpr$tree)r) __class__rrrRszTestFindAllByName.setUpcCs|j|jjdddgdS)Nr/z First tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_namesz+TestFindAllByName.test_find_all_by_tag_namecCs\|j|jjddddg|j|jjdddddg|j|jjdtjddddgdS)Nr/z First tag.)r)Tz Nested tag.r)rrSr-r6r7)rrrrtest_find_all_by_name_and_textsz0TestFindAllByName.test_find_all_by_name_and_textcCs|j|jjjddgdS)Nr/z Nested tag.)rrScr-)rrrr!test_find_all_on_non_root_elementsz3TestFindAllByName.test_find_all_on_non_root_elementcCs|j|jdddgdS)Nr/z First tag.z Nested tag.)rrS)rrrr%test_calling_element_invokes_find_allsz7TestFindAllByName.test_calling_element_invokes_find_allcCs |j|jjtdddgdS)Nr/z First tag.z Nested tag.)rrSr-r )rrrrtest_find_all_by_tag_strainersz/TestFindAllByName.test_find_all_by_tag_strainercCs"|j|jjddgdddgdS)Nr/r"z First tag.z Second tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_namessz,TestFindAllByName.test_find_all_by_tag_namescCs$|j|jjddddddgdS)NT)r/r"z First tag.z Second tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_dictsz+TestFindAllByName.test_find_all_by_tag_dictcCs$|j|jjtjddddgdS)Nz^[ab]$z First tag.z Second tag.z Nested tag.)rrSr-r6r7)rrrrtest_find_all_by_tag_resz)TestFindAllByName.test_find_all_by_tag_recCs,dd}|jd}|j|j|ddgdS)NcSs|j|jdkS)Nr)rNget)rrrrid_matches_nameszRTestFindAllByName.test_find_all_with_tags_matching_method..id_matches_namezMatch 1. Does not match. Match 2.zMatch 1.zMatch 2.)r$rr-)rr_rSrrr'test_find_all_with_tags_matching_methods z9TestFindAllByName.test_find_all_with_tags_matching_methodcCsx|jd}|jdd}|jdtjd}|jdddg\}}|jd|j|jd|j|jd|j|jd|jdS)NzH
1
2
3
divza dza br<r;)r$r%r6r7r-rr)rr$Zr1Zr2Zr3Zr4rrr%test_find_with_multi_valued_attributes z7TestFindAllByName.test_find_with_multi_valued_attribute)rrr r1rRrUrVrXrYrZr[r\r]r`rb __classcell__rr)rTrrPs   rPc@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!S)"TestFindAllByAttributecCs&|jd}|j|jddddgdS)Nz Matching a. Non-matching Matching b.a. first)rz Matching a.z Matching b.)r$rr-)rrSrrrtest_find_all_by_attribute_namesz6TestFindAllByAttribute.test_find_all_by_attribute_namecCstdjd}djd}|j|}|j|jg|j|d|j|jg|j|jdd|j|jg|j|dgddS)Nuםולשutf8u)titlezsomething else)encoder$rr/r-decode)rZpeacedatar$rrr%test_find_all_by_utf8_attribute_values    zName match. Class match. Non-match. A tag called 'name1'. Zname1)rNzA tag called 'name1'.rN)rMz Name match.classZclass2z Class match.)r$rr-)rrSrrrtest_find_all_by_attribute_dictsz6TestFindAllByAttribute.test_find_all_by_attribute_dictcCs|jd}|j|jddddg|j|jddddg|j|jdd ddg|j|jdddg|j|jdd ddg|j|jdddg|j|jdd dgdS) Nz Class 1. Class 2. Class 1. Class 3 and 4. r/r;)class_zClass 1.rWr<zClass 3 and 4.r?)rM)r$rr-)rrSrrrtest_find_all_by_classsz-TestFindAllByAttribute.test_find_all_by_classcCst|jd}|jdtjdd}|j|dg|jdtjdd}|j|dg|jdtjdd}|j|dgdS)Nz#Found itZgaro)rozFound itr/zo b)r$r-r6r7r)rrSfrrr0test_find_by_class_when_multiple_classes_present-s zGTestFindAllByAttribute.test_find_by_class_when_multiple_classes_presentcCsd|jd}|j|jdtjddgdd}|j|jd|gdd}|j|jd|dgdS) NzFound itr/ZbazFound itcSs t|dkS)Nr9)r,)valuerrrbig_attribute_value@sznTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_class..big_attribute_valuecSs t|dkS)Nr9)r,)rtrrrsmall_attribute_valueEszpTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_class..small_attribute_value)r$rr-r6r7)rr$rurvrrr:test_find_all_with_non_dictionary_for_attrs_finds_by_class;s zQTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_classcCs|jd}|jd\}}|j||g|jdd|j|g|jdd|j|g|jddd|j|g|jdd|jg|jdddS)Nz*r/rBr3zfoo bar)rozbar foo)r$r-r)rr$r/Za2rrr:test_find_all_with_string_for_attrs_finds_multiple_classesKs zQTestFindAllByAttribute.test_find_all_with_string_for_attrs_finds_multiple_classescCs0|jd}tddid}|j|j|dgdS)Nzi Match. Non-match.rre)rMzMatch.)r$r rr-)rrSstrainerrrr'test_find_all_by_attribute_soupstrainerWsz>TestFindAllByAttribute.test_find_all_by_attribute_soupstrainercCs&|jd}|j|jddddgdS)NzID present. No ID present. ID is empty.r/)rzNo ID present.)r$rr-)rrSrrr$test_find_all_with_missing_attribute_sz;TestFindAllByAttribute.test_find_all_with_missing_attributecCs&|jd}|j|jddddgdS)NzID present. No ID present. ID is empty.T)rz ID present.z ID is empty.)r$rr-)rrSrrr$test_find_all_with_defined_attributegsz;TestFindAllByAttribute.test_find_all_with_defined_attributecCs>|jd}ddg}|j|jdd||j|jdd|dS)Nz[Unquoted attribute. Quoted attribute.zUnquoted attribute.zQuoted attribute.r=)rr;)r$rr-)rrSZexpectedrrr$test_find_all_with_numeric_attributeps z;TestFindAllByAttribute.test_find_all_with_numeric_attributecCs,|jd}|j|jdddgdddgdS)Nz1 2 3 No ID.r;r<r?)r)r$rr-)rrSrrr(test_find_all_with_list_attribute_valuesysz?TestFindAllByAttribute.test_find_all_with_list_attribute_valuescCs,|jd}|j|jtjddddgdS)NzOne a. Two as. Mixed as and bs. One b. No ID.z^a+$)rzOne a.zTwo as.)r$rr-r6r7)rrSrrr5test_find_all_with_regular_expression_attribute_valueszLTestFindAllByAttribute.test_find_all_with_regular_expression_attribute_valuecCsX|jd}|j}|j|g|jddd|jg|jddd|jg|jddddS)Nzfoobarfoor/rB)r)r3)r$r/rr-)rr$r/rrr'test_find_by_name_and_containing_strings  z>TestFindAllByAttribute.test_find_by_name_and_containing_stringcCs*|jd}|j|jd|jddddS)Nz"foofoor/rB)r))r$rr-)rr$rrr=test_find_by_name_and_containing_string_when_string_is_burieds zTTestFindAllByAttribute.test_find_by_name_and_containing_string_when_string_is_buriedcCsB|jd}|j}|j|g|jddd|jg|jddddS)Nz"foofoor+rB)rr)r=r3)r$r/rr-)rr$r/rrr,test_find_by_attribute_and_containing_strings zCTestFindAllByAttribute.test_find_by_attribute_and_containing_stringN)rrr rfrlrnrprsrwrxrzr{r|r}r~rrrrrrrrrds       rdc@seZdZdZddZdS) TestIndexzTest Tag.indexcCsN|jd}|j}x(t|jD]\}}|j||j|qW|jt|jddS)Nah
Identical Not identical Identical Identical with child Also not identical Identical with child
r=)r$ra enumeratecontentsrindex assertRaises ValueError)rrSraielementrrr test_indexs zTestIndex.test_indexN)rrr r1rrrrrrsrcs`eZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ Z S)TestParentOperationsz;Test navigation and searching through an element's parents.cs(tt|j|jd|_|jj|_dS)Na1
          Start here
      )rQrrRr$rSr"start)r)rTrrrRszTestParentOperations.setUpcCsF|j|jjdd|j|jjjdd|j|jjjjdddS)Nrbottommiddletop)rrparent)rrrr test_parentsz TestParentOperations.test_parentcCs |jjd}|j|j|jdS)Nr)rSrrr)rZtop_tagrrr%test_parent_of_top_tag_is_soup_objects z:TestParentOperations.test_parent_of_top_tag_is_soup_objectcCs|jd|jjdS)N)rrSr)rrrrtest_soup_object_has_no_parentsz3TestParentOperations.test_soup_object_has_no_parentcCs8|j|jjddddg|j|jjddddgdS)Nulrrr)r)rrZ find_parents)rrrrtest_find_parentssz&TestParentOperations.test_find_parentscCs8|j|jjddd|j|jjddddddS)Nrrrr)r)rr find_parent)rrrrtest_find_parentsz%TestParentOperations.test_find_parentcCs"|jjdd}|j|jjddS)Nz Start here)r)r")rSr%rrrN)rr)rrrtest_parent_of_text_elementsz0TestParentOperations.test_parent_of_text_elementcCs(|jjdd}|j|jddddS)Nz Start here)r)rrr)rSr%rr)rr)rrrtest_text_element_find_parentsz2TestParentOperations.test_text_element_find_parentcCs(dd|jjD}|j|dddgdS)NcSs&g|]}|dk rd|jkr|dqS)Nr)rM)rrrrrrsz>TestParentOperations.test_parent_generator..rrr)rparentsr)rrrrrtest_parent_generatorsz*TestParentOperations.test_parent_generator)rrr r1rRrrrrrrrrrcrr)rTrrs rcseZdZfddZZS) ProximityTestcstt|j|jd|_dS)NzgOneTwoThree)rQrrRr$rS)r)rTrrrRszProximityTest.setUp)rrr rRrcrr)rTrrsrcsTeZdZfddZddZddZddZd d Zd d Zd dZ ddZ Z S)TestNextOperationscstt|j|jj|_dS)N)rQrrRrSr"r)r)rTrrrRszTestNextOperations.setUpcCs*|j|jjd|j|jjjdddS)NOnerr#)rr next_element)rrrr test_nextszTestNextOperations.test_nextcCs |jjdd}|j|jddS)NThree)r))rSr%rr)rZlastrrrtest_next_of_last_item_is_nonesz1TestNextOperations.test_next_of_last_item_is_nonecCs|j|jjddS)N)rrSr)rrrrtest_next_of_root_is_nonesz,TestNextOperations.test_next_of_root_is_nonecCsB|j|jjdddg|jjdd|j|jjdddgdS)Nr"Tworr9)r)rr find_all_next)rrrrtest_find_all_nextsz%TestNextOperations.test_find_all_nextcCs2|j|jjddd|j|jjddddS)Nr"rr#r)r))rr find_next)rrrrtest_find_next sz!TestNextOperations.test_find_nextcCs<|jjdd}|j|jdjd|j|jdddgdS)Nr)r)r"rr)rSr%rrrrr)rr)rrrtest_find_next_for_text_elementsz2TestNextOperations.test_find_next_for_text_elementcCsF|jjdd}dd|jD}|\}}|j|dd|j|ddS)Nr)r)cSsg|]}|qSrr)rnoderrrrsz:TestNextOperations.test_next_generator..rr<r)rSr%Z next_elementsr)rrZ successorsrrrrrtest_next_generators z&TestNextOperations.test_next_generator) rrr rRrrrrrrrrcrr)rTrrs rcsTeZdZfddZddZddZddZd d Zd d Zd dZ ddZ Z S)TestPreviousOperationscs"tt|j|jjdd|_dS)Nr)r))rQrrRrSr%end)r)rTrrrRszTestPreviousOperations.setUpcCs*|j|jjdd|j|jjjddS)Nrr<r)rrprevious_element)rrrr test_previous!sz$TestPreviousOperations.test_previouscCs|jjd}|j|jddS)Nhtml)rSr%rr)rrerrr#test_previous_of_first_item_is_none%s z:TestPreviousOperations.test_previous_of_first_item_is_nonecCsdS)Nr)rrrrtest_previous_of_root_is_none)sz4TestPreviousOperations.test_previous_of_root_is_nonecCs6|j|jjddddg|j|jjdddgdS)Nr"rrrr=)r)rrfind_all_previous)rrrrtest_find_all_previous/sz-TestPreviousOperations.test_find_all_previouscCs2|j|jjddd|j|jjddddS)Nr"rr<r)r))rr find_previous)rrrrtest_find_previous7sz)TestPreviousOperations.test_find_previouscCs>|jjdd}|j|jdjd|j|jddddgdS)Nr)r)r"rr)rSr%rrrrr)rr)rrr#test_find_previous_for_text_element;sz:TestPreviousOperations.test_find_previous_for_text_elementcCsh|jjdd}dd|jD}|\}}}}|j|dd|j|jd|j|jd|j|jd dS) Nr)r)cSsg|]}|qSrr)rrrrrrCszBTestPreviousOperations.test_previous_generator..rr;bodyheadr)rSr%Zprevious_elementsrrN)rrZ predecessorsr"rrrrrrtest_previous_generatorAs z.TestPreviousOperations.test_previous_generator) rrr rRrrrrrrrrcrr)rTrrs rcseZdZfddZZS) SiblingTestcs4tt|jd}tjdjd|}|j||_dS)Na z\n\s*)rQrrRr6r7subr$rS)rmarkup)rTrrrRPs zSiblingTest.setUp)rrr rRrcrr)rTrrNsrcsLeZdZfddZddZddZddZd d Zd d Zd dZ Z S)TestNextSiblingcs"tt|j|jjdd|_dS)Nr;)r)rQrrRrSr%r)r)rTrrrRfszTestNextSibling.setUpcCs|j|jjddS)N)rrS next_sibling)rrrr!test_next_sibling_of_root_is_nonejsz1TestNextSibling.test_next_sibling_of_root_is_nonecCsB|j|jjdd|j|jjjdd|j|jjdddS)Nrr#r<z1.1)rrrr)rrrrtest_next_siblingmsz!TestNextSibling.test_next_siblingcCsN|j|jjjd|jjdd}|j|jd|jjdd}|j|jddS)Nz1.1)rr?)rrSrrr%)r nested_spanZ last_spanrrrtest_next_sibling_may_not_existts z/TestNextSibling.test_next_sibling_may_not_existcCs|j|jjddddS)Nspanrr#)rrfind_next_sibling)rrrrtest_find_next_sibling}sz&TestNextSibling.test_find_next_siblingcCs6|j|jjddddg|j|jjdddgdS)Nrr#r<r?)r)rrfind_next_siblings)rrrrtest_next_siblingss z"TestNextSibling.test_next_siblingscCsv|jd}|jdd}|j|jjd|j|jjd|j|jddg|j|jddd|j|jddddS)NzFoobarbazr4)r)r"bazr3nonesuch)r$r%rrrNrrr)rr$rrrr"test_next_sibling_for_text_elements  z2TestNextSibling.test_next_sibling_for_text_element) rrr rRrrrrrrrcrr)rTrrds  rcsLeZdZfddZddZddZddZd d Zd d Zd dZ Z S)TestPreviousSiblingcs"tt|j|jjdd|_dS)Nr?)r)rQrrRrSr%r)r)rTrrrRszTestPreviousSibling.setUpcCs|j|jjddS)N)rrSprevious_sibling)rrrr%test_previous_sibling_of_root_is_nonesz9TestPreviousSibling.test_previous_sibling_of_root_is_nonecCsB|j|jjdd|j|jjjdd|j|jjdddS)Nrr<r#z3.1)rrrr)rrrrtest_previous_siblingsz)TestPreviousSibling.test_previous_siblingcCsN|j|jjjd|jjdd}|j|jd|jjdd}|j|jddS)Nz1.1)rr;)rrSrrr%)rrZ first_spanrrr#test_previous_sibling_may_not_exists z7TestPreviousSibling.test_previous_sibling_may_not_existcCs|j|jjddddS)Nrrr<)rrfind_previous_sibling)rrrrtest_find_previous_siblingsz.TestPreviousSibling.test_find_previous_siblingcCs6|j|jjddddg|j|jjdddgdS)Nrr<r#r;)r)rrfind_previous_siblings)rrrrtest_previous_siblingss z*TestPreviousSibling.test_previous_siblingscCsv|jd}|jdd}|j|jjd|j|jjd|j|jddg|j|jddd|j|jddddS)NzFoobarbazr)r)r"r4r3r)r$r%rrrNrrr)rr$rrrr&test_previous_sibling_for_text_elements  z:TestPreviousSibling.test_previous_sibling_for_text_element) rrr rRrrrrrrrcrr)rTrrs  rc@s0eZdZdZddZddZddZdd Zd S) TestTagCreationz$Test the ability to create new tags.cCsd|jd}|jddddid}|jt|t|jd|j|jtddd|j|jd|j dS)NrrBrrNza name)r3rM)r3rN) r$new_tagrH isinstancer rrNdictrMr)rr$rrrr test_new_tags  zTestTagCreation.test_new_tagcCstrBtdd}|jd}|jd}|jd|j|jd|jtdd}|jd}|jd}|jd|j|jd|jdS) Nrzlxml-xmlbrps
      s

      z html.parsers

      )XML_BUILDER_PRESENTrrrri)rZxml_soupZxml_brZxml_pZ html_soupZhtml_brZhtml_prrr1test_tag_inherits_self_closing_rules_from_builders      zATestTagCreation.test_tag_inherits_self_closing_rules_from_buildercCs4|jd}|jd}|jd||jt|tdS)NrrB)r$ new_stringrrHrr )rr$srrr'test_new_string_creates_navigablestrings   z7TestTagCreation.test_new_string_creates_navigablestringcCs6|jd}|jdt}|jd||jt|tdS)NrrB)r$rrrrHr)rr$rrrr3test_new_string_can_create_navigablestring_subclasss   zCTestTagCreation.test_new_string_can_create_navigablestring_subclassN)rrr r1rrrrrrrrrs rc@s<eZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!d"Zd#d$Zd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zd/d0Zd1d2Zd3d4Zd5d6Zd7d8Zd9d:Zd;d<Z d=d>Z!d?d@Z"dAdBZ#dCdDZ$dEdFZ%dGdHZ&dIdJZ'dKdLZ(dMS)NTestTreeModificationcCsl|jd}d|jd<|j|j|jd|jd=|j|j|jdd|jd<|j|j|jddS) Nzr+rzzrBZid2z)r$r/rrj document_for)rr$rrrtest_attribute_modifications   z0TestTreeModification.test_attribute_modificationcCsltjd}|jd|d}t||d}t||d}d|d<|jjd||jjd ||j|jjd dS) Nrz )builderr/olzhttp://foo.com/Zhrefrr=s4
        )rlookupr$r rinsertrri)rrr$r/rrrrtest_new_tag_creations   z*TestTreeModification.test_new_tag_creationcCs\d}|j|}|jdd}|j}|jddj|j|j|j||j|j|jddS)NzT

        Don't leave me here.

        Don't leave!

        r#)rzD

        Don't leave me .

        Don't leave!here

        )r$r%r"rDrrrjr)rdocr$Z second_paraZboldrrr!test_append_to_contents_moves_tags   z6TestTreeModification.test_append_to_contents_moves_tagcCs0d}|j|}|j}|j|j}|j||dS)Nz)r$r/ replace_withrWr)rr)r$r/new_arrr1test_replace_with_returns_thing_that_was_replaceds   zFTestTreeModification.test_replace_with_returns_thing_that_was_replacedcCs,d}|j|}|j}|j}|j||dS)Nz)r$r/unwrapr)rr)r$r/rrrr+test_unwrap_returns_thing_that_was_replaceds  z@TestTreeModification.test_unwrap_returns_thing_that_was_replacedcCsJ|jd}|j}|j|jd|j|jt|j|jt|j|j dS)NzFooBar) r$r/extractrrrrrrrW)rr$r/rrrItest_replace_with_and_unwrap_give_useful_exception_when_tag_has_no_parents  z^TestTreeModification.test_replace_with_and_unwrap_give_useful_exception_when_tag_has_no_parentcCs:d}|j|}|j}|jj||j|j|j|dS)Nz-Foo)r$rWrrrjr)rr)r$rWrrrtest_replace_tag_with_itself's   z1TestTreeModification.test_replace_tag_with_itselfcCs&d}|j|}|jt|jj|jdS)Nz)r$rrr"rr/)rr)r$rrr1test_replace_tag_with_its_parent_raises_exception.s zFTestTreeModification.test_replace_tag_with_its_parent_raises_exceptioncCs(d}|j|}|jt|jjd|jdS)Nzr)r$rrr/r)rr)r$rrr,test_insert_tag_into_itself_raises_exception3s zATestTreeModification.test_insert_tag_into_itself_raises_exceptionc Cs|jd}d}|j|}|jd|x|jD]}q,Wt|j\}}}}|jd|j|jd|j|jd|j|jd|jdS) zInserting one BeautifulSoup object into another actually inserts all of its children -- you'll never combine BeautifulSoup objects. z-

        And now, a word:

        And we're back.

        z

        p2

        p3

        r=zAnd now, a word:p2p3zAnd we're back.N)r$rZ descendantslistZchildrenrr) rr$r)Z to_insertrp1rrZp4rrr1test_insert_beautifulsoup_object_inserts_children8s    zFTestTreeModification.test_insert_beautifulsoup_object_inserts_childrencCsX|jd}|j}|jd}|jdd|j\}}|jd|jd|jd|jjdS)Nz

        onethree

        rr=ZtworZthree)r$r/rrZ replaceWithrr"r)rr$r/r"leftrightrrr3test_replace_with_maintains_next_element_throughoutLs      zHTestTreeModification.test_replace_with_maintains_next_element_throughoutcCsl|jd}|jddjd|jdd}|j}|j|j||j|j||j|jj||j|jddS)Nz Argh!zArgh!)r)zHooray!)r$r%rr"rrrr)rr$new_textr"rrrtest_replace_final_node[s  z,TestTreeModification.test_replace_final_nodecCs|jd}|jjdd|j|j|jd|jdd}|j|jd|j|jj||j|j d|j|j j ||j|j d|j|j|j dS)NzArgh!r=zHooray!z!Argh!Hooray!)r)zArgh!) r$r"rrrjrr%rrrrrW)rr$rrrrtest_consecutive_text_nodeses   z0TestTreeModification.test_consecutive_text_nodescCsT|jd}|jjdd|jjdd|jddg|jj|j|jjdjddS)Nzrr3rB)r$r/rrrr)rr$rrrtest_insert_stringzs  z'TestTreeModification.test_insert_stringcCs|j}|jd|d}t||d}|jdd|jjd||j|j|jd|j}|j|j ||j|j ||j dd }|j|j ||j|j ||j}|j|j ||j|j ||j dd }|j|j||j|j ||j|j |dS) Nz%Findlady!)rZmagictagrther=z=Findthelady!ZFind)r))Zdefault_builderr$r rr/rrjrr"rrr%rrrWr)rrr$Z magic_tagZb_tagr%Zc_tagrrrrtest_insert_tags,      z$TestTreeModification.test_insert_tagcCs0d}|j|}|jj|j|j||jdS)Nz)r$r/rDr"rrj)rrkr$rrr*test_append_child_thats_already_at_the_ends z?TestTreeModification.test_append_child_thats_already_at_the_endcCs2d}|j|}|jjd|j|jd|jdS)Nzrz)r$r/rdrrj)rrkr$rrr$test_move_tag_to_beginning_of_parents z9TestTreeModification.test_move_tag_to_beginning_of_parentcCs.|jd}|jjdd|jt|jddS)Nz
        r=ZContentsz
        Contents
        )r$rrrr()rr$rrr&test_insert_works_on_empty_element_tags z;TestTreeModification.test_insert_works_on_empty_element_tagcCs`|jd}|jjd|jjd|j|j|jd|jj|j|j|j|jddS)NzfoobarBAZQUUXzQUUXfooBAZbarzQUUXbarfooBAZ)r$r" insert_beforer/rrjr)rr$rrrtest_insert_befores   z'TestTreeModification.test_insert_beforecCs`|jd}|jjd|jjd|j|j|jd|jj|j|j|j|jddS)Nzfoobarr r zfooQUUXbarBAZzQUUXbarfooBAZ)r$r" insert_afterr/rrjr)rr$rrrtest_insert_afters   z&TestTreeModification.test_insert_aftercCsR|jd}|jd}|jd}|jt|j||jt|j||jt|j|dS)Nrr/)r$rrrrrNotImplementedError)rr$rrrrr:test_insert_after_raises_exception_if_after_has_no_meanings    zOTestTreeModification.test_insert_after_raises_exception_if_after_has_no_meaningcCsR|jd}|jd}|jd}|jt|j||jt|j||jt|j|dS)Nrr/)r$rrrrr r)rr$rrrrrFtest_insert_before_raises_notimplementederror_if_before_has_no_meanings    z[TestTreeModification.test_insert_before_raises_notimplementederror_if_before_has_no_meaningcCsv|jd}|jd\}}|j||j|j|jd|j|jd|j|j|j|j|jd|j|j ddS)Nz;

        There's no business like show business

        r"z0

        There's business like no business

        noz business) r$r-rrrjrrrrr)rr$rZshowrrrtest_replace_withs z&TestTreeModification.test_replace_withcCs0d}|j|}|jj|j|jd|jdS)Nzz)r$r"rrWrrj)rrkr$rrrtest_replace_first_childs z-TestTreeModification.test_replace_first_childcCs0d}|j|}|jj|j|jd|jdS)Nzz)r$rWrr"rrj)rrkr$rrrtest_replace_last_childs z,TestTreeModification.test_replace_last_childcCs|jd}|j}|j}|j||j|j|jd|j|jd|j|jddj d|j|j d|j|j d|j|j d|j|j|j |j|j d|j|j j |j|j|j d|jdd}|j}|j|j ||j|j ||j|j ||j|j |dS)NzQWereservetherighttorefuseservicez-Werefusetoservicer)r)ZWeZto)r$r"rrrrrjrrr%rrrrr/eg)rr$Z remove_tagZmove_tagZto_textZg_tagrrrtest_nested_tag_replace_withs.   z1TestTreeModification.test_nested_tag_replace_withcCs6|jd}|jj|j|jd|j|jjddS)NzI

        Unneeded formatting is unneeded

        zUnneeded formatting is unneeded)r$emrrrr))rrSrrr test_unwraps  z TestTreeModification.test_unwrapcCsF|jd}|jj|jd}|j|jd|j|j|jddS)NzI wish I was bold.r"zI wish I was bold.)r$rwraprrrjr)rr$rtrrr test_wrap"s  zTestTreeModification.test_wrapcCs4|jd}|jjj|j|j|j|jddS)NzI wish I was bold.zI wish I was bold.)r$r"rrrrjr)rr$rrr%test_wrap_extracts_tag_from_elsewhere)s z:TestTreeModification.test_wrap_extracts_tag_from_elsewherecCsH|jd}|jjj|j|jdt|jj|j|j|jddS)Nz+I like being bold.I wish I was bold.r+z+I like being bold.I wish I was bold.) r$r"rrrr,rrjr)rr$rrr&test_wrap_puts_new_contents_at_the_end/s   z;TestTreeModification.test_wrap_puts_new_contents_at_the_endcCs|jd}|jt|jjd|jddj}|j|jd|j|jd|jt|jjd|j|jd|j|j d|j|j j d|jdd }|jd d }|j|j ||j|j ||j|j ||j|j |dS) NzRSome content. More content.r9Znav)rz6Some content. More content.zr+zSome content. )r)z More content.) r$rr,rrr%rrjrrrrr)rr$Z extractedZ content_1Z content_2rrr test_extract7s"   z!TestTreeModification.test_extractcCsz|jd}|jj}|jj}|jd}|jd}|jj||jj||j|j|j||jj|j||jjdS)NzfoobarrBr3)r$r/rr"rrDrr)rr$Zfoo_1Zbar_1Zfoo_2Zbar_2rrr4test_extract_distinguishes_between_identical_stringsPs     zITestTreeModification.test_extract_distinguishes_between_identical_stringscs8|jdfddjdD|jdtjdS)Nzv csg|]}jjqSr)scriptr)rr)r$rrrmszKTestTreeModification.test_extract_multiples_of_same_tag..r#z )r$r-rr(r)rr)r$r"test_extract_multiples_of_same_tagas z7TestTreeModification.test_extract_multiples_of_same_tagcCs.|jd}|jdj|jd|jddS)Nz hi r)r$r%rr)rr$rrrBtest_extract_works_when_element_is_surrounded_by_identical_stringsqszWTestTreeModification.test_extract_works_when_element_is_surrounded_by_identical_stringscCsf|jd}|j}|jj|jt|jjd|jt|d|j }|jdd|jdt|jdS)z Tag.clear()z4

        String Italicized and another

        rrT)Z decomposeN) r$r/rclearrr,rrHrIr)rr$r/rrrr test_clearzs   zTestTreeModification.test_clearcCsB|jd}d|j_|j|jjdgd|j_|j|jjdgdS)zTag.string = 'string'z rBr3N)r$r/rrrr")rr$rrrtest_string_sets  z$TestTreeModification.test_string_setcCs,|jd}|jj|j_|j|jjddS)Nzfoobarsbarbar)r$rWrr"rr/ri)rr$rrr/test_string_set_does_not_affect_original_strings  zDTestTreeModification.test_string_set_does_not_affect_original_stringcCs2|jd}td}||j_|jt|jjtdS)NzrB)r$rr/rrHr)rr$cdatarrr)test_set_string_preserves_class_of_strings z>TestTreeModification.test_set_string_preserves_class_of_stringN))rrr rrrrrrrrrrrrrrrrr r rrrrrrrrrrrr r!r"r$r%r'r(r)r+rrrrrsL       $ rc@sxeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZdS)TestElementObjectsz)Test various features of element objects.cCsV|jd}|jt|jd|jt|d|jt|jd|jt|jjddS)z3The length of an element is its number of children.z123r=r9N)r$rr,rr)rr$rrrtest_lens  zTestElementObjects.test_lencCsL|jd}|j|j|jd|j|jj|jdjd|j|jddS)z2Accessing a Python member .foo invokes find('foo')zr"rN)r$rr"r%rr/)rr$rrrtest_member_access_invokes_finds z2TestElementObjects.test_member_access_invokes_findc CsP|jd}tjdd }|j}WdQRX|j|j||jdt|djdS)NzT)recordzp.bTag is deprecated, use .find("b") instead. If you really were looking for a tag called bTag, use .find("bTag")r)r$warningscatch_warningsZbTagrr"r(message)rr$wrrrrtest_deprecated_member_accesss z0TestElementObjects.test_deprecated_member_accesscCs2|jd}|j|jjd|j|jjddS)zhas_attr() checks for the presence of an attribute. Please note note: has_attr() is different from __in__. has_attr() checks the tag's attributes and __in__ checks the tag's chidlren. zattrZattr2N)r$rHrBhas_attr assertFalse)rr$rrr test_has_attrs z TestElementObjects.test_has_attrcCsd}|j|ddS)Nz%z%)ZassertSoupEquals)rrrrr.test_attributes_come_out_in_alphabetical_orderszATestElementObjects.test_attributes_come_out_in_alphabetical_ordercCs|jd}|j|jjddS)Nz foorB)r$rr"r)rr$rrr test_strings zTestElementObjects.test_stringcCs|jd}|j|jjddS)Nz)r$rr"r)rr$rrrtest_empty_tag_has_no_strings z/TestElementObjects.test_empty_tag_has_no_stringcCs`|jd}|j|jjd|jd}|j|jjd|jd}|jjdd|j|jjddS)Nzfoo
        zfoobar
        z foo
        r=r3)r$rr"rr/r)rr$rrr-test_tag_with_multiple_children_has_no_strings   z@TestElementObjects.test_tag_with_multiple_children_has_no_stringcCs,|jd}|j|jjd|j|jddS)NzfoorB)r$rr/r)rr$rrr)test_tag_with_recursive_string_has_strings zfeozN)r$r7r"r)rr$rrrtest_lack_of_strings  z&TestElementObjects.test_lack_of_stringcCs`|jd}|j|jjd|j|jjddd|j|jjdd|j|jjddddd S) zBTag.text and Tag.get_text(sep=u"") -> all child text, concatenatedzar t zar t T)stripZart,z a,r, , t za,r,tN)r$rr/r)get_text)rr$rrr test_all_texts  z TestElementObjects.test_all_textcCsJ|jd}|j|jd|j|jttfdd|j|jddddS)NzfoobarZfoobar)typesZ fooIGNOREbar)r$rrAr r)rr$rrrtest_get_text_ignores_commentss  z1TestElementObjects.test_get_text_ignores_commentscCs$|jd}|jddgt|jdS)NzfoobarrBr3)r$rrZstrings)rr$rrr!test_all_strings_ignores_commentss z4TestElementObjects.test_all_strings_ignores_commentsN)rrr r1r-r.r4r8r9r:r;r<r=r>rBrDrErrrrr,s   r,c@sPeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ dS)TestCDAtaListAttributesz0Testing cdata-list attributes like 'class'. cCs"|jd}|jdg|jddS)NzrBrm)r$rr/)rr$rrrtest_single_value_becomes_list s z6TestCDAtaListAttributes.test_single_value_becomes_listcCs$|jd}|jddg|jddS)NzrBr3rm)r$rr/)rr$rrr!test_multiple_values_becomes_lists z9TestCDAtaListAttributes.test_multiple_values_becomes_listcCs&|jd}|jdddg|jddS)NzrBr3rrm)r$rr/)rr$rrr2test_multiple_values_separated_by_weird_whitespaces zJTestCDAtaListAttributes.test_multiple_values_separated_by_weird_whitespacecCs |jd}|jd|jjdS)Nzs)r$rr/ri)rr$rrr,test_attributes_joined_into_string_on_outputs zDTestCDAtaListAttributes.test_attributes_joined_into_string_on_outputcCs$|jd}|jdg|jjddS)Nzzabc defr)r$rr/Zget_attribute_list)rr$rrrtest_get_attribute_lists z/TestCDAtaListAttributes.test_get_attribute_listcCs$|jd}|jddg|jddS)Nz(
        z ISO-8859-1zUTF-8zaccept-charset)r$rZform)rr$rrrtest_accept_charset!s z+TestCDAtaListAttributes.test_accept_charsetcCs$d}|j|}|jd|jddS)Nz)zISO-8859-1 UTF-8zaccept-charset)r$rr/)rrkr$rrr-test_cdata_attribute_applying_only_to_one_tag%s zETestCDAtaListAttributes.test_cdata_attribute_applying_only_to_one_tagcs6|jdj|jdjfdd}|jt|dS)Nrcs d_dS)NrB)rNr)rrrt0szJTestCDAtaListAttributes.test_string_has_immutable_name_property..t)r$rrrNrAttributeError)rrNr)rr'test_string_has_immutable_name_property-s  z?TestCDAtaListAttributes.test_string_has_immutable_name_propertyN) rrr r1rGrHrIrJrKrLrMrPrrrrrF srFcs`eZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ Z S)TestPersistencez*Testing features like pickle and deepcopy.cs&tt|jd|_|j|j|_dS)Nay Beautiful Soup: We called him Tortoise because he taught us. foo bar )rQrQrRZpager$rS)r)rTrrrR7szTestPersistence.setUpcCs@tj|jd}tj|}|j|jt|j|j|jjdS)Nr+)pickledumpsrSloadsrrTrrj)rdumpedloadedrrr!test_pickle_and_unpickle_identityKs z1TestPersistence.test_pickle_and_unpickle_identitycCs&tj|j}|j|j|jjdS)N)copydeepcopyrSrrj)rZcopiedrrrtest_deepcopy_identitySs z&TestPersistence.test_deepcopy_identitycCs:tdd}|j}|j}|jdt||j||jdS)Ns

         

        z html.parseru

         

        )rZoriginal_encoding__copy__rr()rr$encodingrXrrrtest_copy_preserves_encodingXs  z,TestPersistence.test_copy_preserves_encodingcCs>d}|j|}tj|tj}tj|}|j|j|jdS)Nu )r$rRrSZHIGHEST_PROTOCOLrTrrj)rrr$rUrVrrrtest_unicode_pickle_s   z#TestPersistence.test_unicode_picklecCszd}|j|}|jdd}tj|}|j|||jd|j|jd|j|jd|j|jd|j|jd|jdS)NzFooBarr4)r) r$r%rXrrrassertNotEqualrr)rrr$s1s2rrr1test_copy_navigablestring_is_not_attached_to_treegs    zATestPersistence.test_copy_navigablestring_is_not_attached_to_treecCs>d}|j|}|j}tj|}|j|||jt|tdS)Nz)r$rrXrrHrr)rrr$r`rarrr0test_copy_navigablestring_subclass_has_same_typess    z@TestPersistence.test_copy_navigablestring_subclass_has_same_typecCs(d}|j|}tj|}|j||dS)Nz)
        FooBar
        end)r$rXr)rrr$Z soup_copyrrrtest_copy_entire_soup{s  z%TestPersistence.test_copy_entire_soupcCsd}|j|}|j}tj|}|jt|t||j|||j||k|jd|j|jd|j|jd|jddj |j d|jddj dS)Nz)
        FooBar
        endZBar)r) r$rarXrr(r7rrr%rr_)rrr$raZdiv_copyrrrtest_copy_tag_copies_contentss   z-TestPersistence.test_copy_tag_copies_contents)rrr r1rRrWrZr]r^rbrcrdrercrr)rTrrQ4s  rQc@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!S)"TestSubstitutionscCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>minimal) formatter)r$rjrr)rrr$decodedrrr!test_default_formatter_is_minimals  z3TestSubstitutions.test_default_formatter_is_minimalcCs0d}|j|}|jdd}|j||jddS)Nu'
        <<Sacré bleu!>>r)rhz.
        <<Sacré bleu!>>)r$rjrr)rrr$rirrrtest_formatter_htmls   z%TestSubstitutions.test_formatter_htmlcCs0d}|j|}|jdd}|j||jddS)Nu'
        <<Sacré bleu!>>Zhtml5)rhz-
        <<Sacré bleu!>>)r$rjrr)rrr$rirrrtest_formatter_html5s   z&TestSubstitutions.test_formatter_html5cCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>rg)rh)r$rjrr)rrr$rirrrtest_formatter_minimals  z(TestSubstitutions.test_formatter_minimalcCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>)rhu<>)r$rjrr)rrr$rirrrtest_formatter_nulls   z%TestSubstitutions.test_formatter_nullcCs4d}|j|}|jddd}|j||jddS)Nz!<foo>bar
        cSs|jS)N)upper)xrrrsz9TestSubstitutions.test_formatter_custom..)rhzBAR
        )r$rjrr)rrr$rirrrtest_formatter_customs  z'TestSubstitutions.test_formatter_customcCsd}|j|}|j}d}|j||j|j||jddd}|j||jdd|j||jddd}|j||jdd ddS) Nu%eu)erg)rhz/eru%EcSs|jS)N)ro)rprrrrqszMTestSubstitutions.test_formatter_is_run_on_attribute_values..)r$r/rrj)rrr$r/Zexpect_minimalZ expect_htmlZ expect_upperrrr)test_formatter_is_run_on_attribute_valuess z;TestSubstitutions.test_formatter_is_run_on_attribute_valuescCs$d}t|dj}|jd|kdS)NzO z html.parsers < < hey > >)rrirH)rrencodedrrr2test_formatter_skips_script_tag_for_html_documentsszDTestSubstitutions.test_formatter_skips_script_tag_for_html_documentscCs$d}t|dj}|jd|kdS)NzF z html.parsers < < hey > >)rrirH)rrrtrrr1test_formatter_skips_style_tag_for_html_documentsszCTestSubstitutions.test_formatter_skips_style_tag_for_html_documentscCs |jd}|jd|jjdS)Nz*
        foo
          	bar
          
          
        baz z/
        foo
          	bar
          
          
        baz
        )r$rraprettify)rr$rrr,test_prettify_leaves_preformatted_text_alones z>TestSubstitutions.test_prettify_leaves_preformatted_text_alonecCs,tdd}|jddd}|jd|kdS)Nzfooz html.parsercSs|jS)N)ro)rprrrrqszLTestSubstitutions.test_prettify_accepts_formatter_function..)rhZFOO)rrwrH)rr$Zprettyrrr(test_prettify_accepts_formatter_functions z:TestSubstitutions.test_prettify_accepts_formatter_functioncCs"|jd}|jtt|jdS)Nz)r$rr(typerw)rr$rrr(test_prettify_outputs_unicode_by_defaults z:TestSubstitutions.test_prettify_outputs_unicode_by_defaultcCs$|jd}|jtt|jddS)Nzzutf-8)r$rbytesrzrw)rr$rrrtest_prettify_can_encode_datas z/TestSubstitutions.test_prettify_can_encode_datacCs0d}|j|}|jjd}|j||jddS)NuSacré bleu!zutf-8)r$r"rir)rrr$rtrrr,test_html_entity_substitution_off_by_defaults  z>TestSubstitutions.test_html_entity_substitution_off_by_defaultcCsd}|j|}|j|jdd|jd}|jd|k|jd}|jd|k|jd}|jd |k|jd jd }|jd |kdS) NzEZcontentztext/html; charset=x-sjiszutf-8s charset=utf-8euc_jpscharset=euc_jpz shift-jisscharset=shift-jiszutf-16zcharset=utf-16)r$rmetarirHrj)rZmeta_tagr$utf_8r shift_jisZutf_16_urrrtest_encoding_substitution s    z,TestSubstitutions.test_encoding_substitutioncCs2d}td}|j||d}|j|jdjddS)Nz`
        foo
        Zpre)Z parse_onlyr)r r$rrrN)rrryr$rrr;test_encoding_substitution_doesnt_happen_if_tag_is_strained$szMTestSubstitutions.test_encoding_substitution_doesnt_happen_if_tag_is_strainedN)rrr rjrkrlrmrnrrrsrurvrxryr{r}r~rrrrrrrfs       rfc@sPeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ dS) TestEncodingz0Test the ability to encode objects into strings.cCs.d}|j|}|j|jjjddjddS)Nu zutf-8u☃)r$rr"rri)rrr$rrr"test_unicode_string_can_be_encoded2s z/TestEncoding.test_unicode_string_can_be_encodedcCs,d}|j|}|j|jjd|jddS)Nu zutf-8)r$rr"ri)rrr$rrr1test_tag_containing_unicode_string_can_be_encoded8s z>TestEncoding.test_tag_containing_unicode_string_can_be_encodedcCs&d}|j|}|j|jjdddS)Nu asciis)r$rr"ri)rrr$rrrs zITestEncoding.test_encoding_substitutes_unrecognized_characters_by_defaultcCs&d}|j|}|jt|jddddS)Nu rstrict)errors)r$rUnicodeEncodeErrorri)rrr$rrr test_encoding_can_be_made_strictCs z-TestEncoding.test_encoding_can_be_made_strictcCs$d}|j|}|jd|jjdS)Nu u☃)r$rr"Zdecode_contents)rrr$rrrtest_decode_contentsIs z!TestEncoding.test_decode_contentscCs.d}|j|}|jdjd|jjdddS)Nu u☃rg)r\)r$rrir"Zencode_contents)rrr$rrrtest_encode_contentsNs  z!TestEncoding.test_encode_contentscCs*d}|j|}|jdjd|jjdS)Nu u☃rg)r$rrir"ZrenderContents)rrr$rrrtest_deprecated_renderContentsUs z+TestEncoding.test_deprecated_renderContentscCs8d}|j|}tr$|j|t|n|jdt|dS)Nu s \u2603)r$rrrepr)rrr$rrr test_repr[s  zTestEncoding.test_reprN) rrr r1rrrrrrrrrrrrr/src@s,eZdZddZddZddZddZd S) TestNavigableStringSubclassescCsX|jd}td}|jd||jt|d|j|jddd|j|jdddS)NrrBr=z)r)r)r$rrrr(r%r)rr$r*rrr test_cdataes   z(TestNavigableStringSubclasses.test_cdatacsVd_fdd}jd}td}|jd|jd|j|djdjd S) zkText inside a CData object is passed into the formatter. But the return value is ignored. rcsjd7_dS)Nr=zBITTER FAILURE)count)args)rrr incrementvszNTestNavigableStringSubclasses.test_cdata_is_never_formatted..incrementrz<><><>r=s<><>]]>)rhN)rr$rrrri)rrr$r*r)rrtest_cdata_is_never_formattedos   z;TestNavigableStringSubclasses.test_cdata_is_never_formattedcCs2td}|jd}|jd||j|jddS)NrBrr=s )r r$rrri)rZdoctyper$rrrtest_doctype_ends_in_newlines  z:TestNavigableStringSubclasses.test_doctype_ends_in_newlinecCstd}|jd|jdS)NrBz)r rZ output_ready)rrrrrtest_declarationsz.TestNavigableStringSubclasses.test_declarationN)rrr rrrrrrrrrcs rc@seZdZdZddZddZeZddZdd Zd d Z d d Z ddZ ddZ ddZ ddZddZddZddZddZddZd d!Zd"d#Zd$d%Zd&d'Zd(d)Zd*d+Zd,d-Zd.d/Zd0d1Zd2d3Zd4d5Zd6d7Zd8d9Z d:d;Z!dd?Z#d@dAZ$dBdCZ%dDdEZ&dFdGZ'dHdIZ(dJdKZ)dLdMZ*dNdOZ+dPdQZ,dRdSZ-dTdUZ.dVdWZ/dXdYZ0dZd[Z1d\d]Z2d^d_Z3d`daZ4dbdcZ5dddeZ6dfdgZ7dhdiZ8djdkZ9dldmZ:dndoZ;dpdqZ The title Hello there.

        An H1

        Some text

        Some more text

        An H2

        Another

        Bob

        Another H2

        me span1a1 span1a2 test span2a1

        English

        English UK

        English US

        French

        cCst|jd|_dS)Nz html.parser)rHTMLr$)rrrrrRszTestSoupSelector.setUpcKsRdd|jj|f|D}|j|j|j||d|dj|dj|fdS)NcSsg|] }|dqS)rr)relrrrrsz2TestSoupSelector.assertSelects..z$Selector %s, expected [%s], got [%s]z, )r$selectsortrjoin)rselector expected_idskwargsZel_idsrrrrs zTestSoupSelector.assertSelectscGs"x|D]\}}|j||qWdS)N) assertSelect)rZtestsrrrrrassertSelectMultiplesz%TestSoupSelector.assertSelectMultiplecCsF|jjd}|jt|d|j|djd|j|djdgdS)Nrhr=rz The title)r$rrr,rNr)relsrrrtest_one_tag_ones z!TestSoupSelector.test_one_tag_onecCsX|jjd}|jt|dx|D]}|j|jdq"W|jjd}|jd|ddS)Nramainr)r$rrr,rN select_one)rrrarrrrtest_one_tag_manys    z"TestSoupSelector.test_one_tag_manycCs|jjd}|jd|dS)NZnonexistenttag)r$rr)rmatchrrr(test_select_one_returns_none_if_no_matchs z9TestSoupSelector.test_select_one_returns_none_if_no_matchcCs |jjd}|jdddgdS)Nzdiv divinnerdata1)r$rr)rrrrrtest_tag_in_tag_ones z$TestSoupSelector.test_tag_in_tag_onecCs&x dD]}|j|ddddgqWdS) Nhtml div html body divbody divrrrfooter)rrr)r)rrrrrtest_tag_in_tag_manys z%TestSoupSelector.test_tag_in_tag_manycCsB|jddgdd|jdddgdd|jdd ddd gd ddS) Nzhtml divrr=)r:z html body divrr+zbody divrrr>)r)rrrr test_limitszTestSoupSelector.test_limitcCs|jt|jjdddS)Ndelr)rr,r$r)rrrrtest_tag_no_matchsz"TestSoupSelector.test_tag_no_matchcCs|jt|jjddS)Nztag%t)rrr$r)rrrrtest_invalid_tagsz!TestSoupSelector.test_invalid_tagcCs|jdddgdS)Nzcustom-dashed-tagdash1dash2)r)rrrrtest_select_dashed_tag_idssz+TestSoupSelector.test_select_dashed_tag_idscCs6|jjd}|j|djd|j|ddddS)Nzcustom-dashed-tag[id="dash2"]rzcustom-dashed-tagrr)r$rrrN)rZdashedrrrtest_select_dashed_by_ids z)TestSoupSelector.test_select_dashed_by_idcCs|j|jjddjddS)Nzbody > custom-dashed-tagrz Hello there.)rr$rr))rrrrtest_dashed_tag_textsz%TestSoupSelector.test_dashed_tag_textcCs |j|jjd|jjddS)Nzcustom-dashed-tag)rr$rr-)rrrr#test_select_dashed_matches_find_allsz4TestSoupSelector.test_select_dashed_matches_find_allcCs|jddgfdddgfdS)NZh1header1Zh2header2header3)r)rrrrtest_header_tags sz!TestSoupSelector.test_header_tagscCsVxPd D]H}|jj|}|jt|d|j|djd|j|dddgqWdS) N.onepp.onep html p.onepr=rrrmonep)rrr)r$rrr,rN)rrrrrrtest_class_ones   zTestSoupSelector.test_class_onecCs |jjd}|jt|ddS)Nzdiv.onepr)r$rrr,)rrrrrtest_class_mismatched_tags z*TestSoupSelector.test_class_mismatched_tagcCs xdD]}|j|dgqWdS)N div#inner#inner div div#innerr)rrr)r)rrrrr test_one_ids zTestSoupSelector.test_one_idcCs |jjd}|jt|ddS)Nz #doesnotexistr)r$rrr,)rrrrr test_bad_ids zTestSoupSelector.test_bad_idcCsf|jjd}|jt|dx|D]}|j|jdq"W|j|dddg|j|djddS)Nz div#inner pr9rr=rmrr)r$rrr,rNr7r6)rrrrrrtest_items_in_id#s   z!TestSoupSelector.test_items_in_idcCs*x$dD]}|jt|jj|dqWdS)N div#main deldiv#main div.oops div div#mainr)rrr)rr,r$r)rrrrrtest_a_bunch_of_emptys+s z'TestSoupSelector.test_a_bunch_of_emptyscCs xd D]}|j|d gqWdS) N.class1p.class1.class2p.class2.class3p.class3 html p.class2div#inner .class2pmulti)rrrrrrrr)r)rrrrrtest_multi_class_support/sz)TestSoupSelector.test_multi_class_supportcCs xdD]}|j|dgqWdS)N.class1.class3.class3.class2.class1.class2.class3r)rrr)r)rrrrrtest_multi_class_selection4sz+TestSoupSelector.test_multi_class_selectioncCs"|jdddg|jddgdS)Nz.s1 > as1a1s1a2z .s1 > a spans1a2s1)r)rrrrtest_child_selector9sz$TestSoupSelector.test_child_selectorcCs|jddgdS)Nz.s1 > a#s1a2 spanr)r)rrrrtest_child_selector_id=sz'TestSoupSelector.test_child_selector_idcCst|jddgfddgfddgfddgfddgfddgfd dgfd gfd dgfd dgfd dgfdgfdgfdgfdS)Nzp[class="onep"]rz p[id="p1"]z[class="onep"]z [id="p1"]zlink[rel="stylesheet"]l1zlink[type="text/css"]zlink[href="blah.css"]zlink[href="no-blah.css"]z[rel="stylesheet"]z[type="text/css"]z[href="blah.css"]z[href="no-blah.css"]zp[href="no-blah.css"])r)rrrrtest_attribute_equals@sz&TestSoupSelector.test_attribute_equalsc Cs\|jddgfddgfddgfddgfddgfddgfdd gfd d gfd d gfd d gf dS) Nzp[class~="class1"]rzp[class~="class2"]zp[class~="class3"]z[class~="class1"]z[class~="class2"]z[class~="class3"]za[rel~="friend"]bobz a[rel~="met"]z[rel~="friend"]z [rel~="met"])r)rrrrtest_attribute_tildeRsz%TestSoupSelector.test_attribute_tildecCsv|jddgfddgfdgfdgfdgfddgfdd d gfd d d gfd d dgfdd dgfddgfdd gfddgf dS)Nz[rel^="style"]rzlink[rel^="style"]znotlink[rel^="notstyle"]z[rel^="notstyle"]zlink[rel^="notstyle"]zlink[href^="bla"]za[href^="http://"]rmez[href^="http://"]z [id^="p"]rrz [id^="m"]rz div[id^="m"]z a[id^="m"]zdiv[data-tag^="dashed"]r)r)rrrrtest_attribute_startswith`s    z*TestSoupSelector.test_attribute_startswithc CsH|jddgfddgfddgfdddddd d d d gfd dgfdgfdS)Nz[href$=".css"]rzlink[href$=".css"]z link[id$="1"]z [id$="1"]rrrrs2a1rrz div[id$="1"]z[id$="noending"])r)rrrrtest_attribute_endswithqsz(TestSoupSelector.test_attribute_endswithcCs|jddgfddgfdgfdgfdgfddgfdd d gfd d d gfddgfdd gfddgfddgfddgfdddd ddddddg fddgfdgfdd d dgfdd d gfd dgfd!dd"gfd#d"gfd$dgfdS)%Nz[rel*="style"]rzlink[rel*="style"]znotlink[rel*="notstyle"]z[rel*="notstyle"]zlink[rel*="notstyle"]zlink[href*="bla"]z[href*="http://"]rrz [id*="p"]rrz div[id*="m"]rz a[id*="m"]z[href*=".css"]zlink[href*=".css"]z link[id*="1"]z [id*="1"]rrrrrrrz div[id*="1"]z[id*="noending"]z [href*="."]z a[href*="."]zlink[href*="."]z div[id*="n"]rz div[id*="nn"]zdiv[data-tag*="edval"])r)rrrrtest_attribute_contains{s.     z(TestSoupSelector.test_attribute_containscCs2|jddddgfddddgfddgfdgfdS) Nz p[lang|="en"]zlang-enz lang-en-gbz lang-en-usz [lang|="en"]z p[lang|="fr"]zlang-frz p[lang|="gb"])r)rrrrtest_attribute_exact_or_hypens   z.TestSoupSelector.test_attribute_exact_or_hypenc CsV|jddddgfddgfdddgfddd d d gfd d dgfdgfdgfddgfdS)Nz[rel]rrrz link[rel]za[rel]z[lang]zlang-enz lang-en-gbz lang-en-uszlang-frzp[class]rrz[blah]zp[blah]z div[data-tag]r)r)rrrrtest_attribute_existss   z&TestSoupSelector.test_attribute_existscCs,d}t|d}|jd\}|jd|jdS)Nz]
        nope
        yes
        z html.parserzdiv[style="display: right"]yes)rrrr)rrr$Zchosenrrr"test_quoted_space_in_selector_names  z3TestSoupSelector.test_quoted_space_in_selector_namecCs(|jt|jjd|jt|jjddS)Nza:no-such-pseudoclassza:nth-of-type(a))rrr$r)rrrrtest_unsupported_pseudoclasssz-TestSoupSelector.test_unsupported_pseudoclasscCs|jjd}|jt|d|j|djd|jjd}|jt|d|j|djd|jjd}|jt|d|jt|jjddS) Nzdiv#inner p:nth-of-type(1)r=rz Some textzdiv#inner p:nth-of-type(3)ZAnotherzdiv#inner p:nth-of-type(4)zdiv p:nth-of-type(0))r$rrr,rrr)rrrrrtest_nth_of_types   z!TestSoupSelector.test_nth_of_typecCs2|jjd}|jt|d|j|djddS)Nzdiv#inner > p:nth-of-type(1)r=rz Some text)r$rrr,r)rrrrr"test_nth_of_type_direct_descendants z3TestSoupSelector.test_nth_of_type_direct_descendantcCs|jddgdS)Nz#inner > p:nth-of-type(2)r)r)rrrr"test_id_child_selector_nth_of_typesz3TestSoupSelector.test_id_child_selector_nth_of_typecCs.|jjddd}|jd}|j|ddgdS)Nrar)rrr)r$r%rr)rrselectedrrrtest_select_on_elements z'TestSoupSelector.test_select_on_elementcCs|jddg|jdgdS)Nz .fancy #innerrz.normal #inner)r)rrrrtest_overspecified_child_idsz,TestSoupSelector.test_overspecified_child_idcCsB|jddg|jddg|jddg|jg|jjddS)Nz#p1 + h2rz #p1 + h2 + prz#p1 + #header2 + .class1z#p1 + p)rrr$r)rrrrtest_adjacent_sibling_selectorsz/TestSoupSelector.test_adjacent_sibling_selectorcCsR|jdddg|jddg|jddg|jddg|jg|jjddS) Nz#p1 ~ h2rrz#p1 ~ #header2z #p1 ~ h2 + arz#p1 ~ h2 + [rel="me"]z #inner ~ h2)rrr$r)rrrrtest_general_sibling_selectors z.TestSoupSelector.test_general_sibling_selectorcCs|jt|jjddS)Nzh1 >)rrr$r)rrrrtest_dangling_combinatorsz)TestSoupSelector.test_dangling_combinatorcCs|jddddgdS)Nz p[lang] ~ pz lang-en-gbz lang-en-uszlang-fr)r)rrrr2test_sibling_combinator_wont_select_same_tag_twiceszCTestSoupSelector.test_sibling_combinator_wont_select_same_tag_twicecCs|jdddgdS)Nzx, yxidyid)r)rrrrtest_multiple_selectsz%TestSoupSelector.test_multiple_selectcCs|jdddgdS)Nzx,yrr)r)rrrr"test_multiple_select_with_no_spacesz3TestSoupSelector.test_multiple_select_with_no_spacecCs|jdddgdS)Nzx, yrr)r)rrrr$test_multiple_select_with_more_spacesz5TestSoupSelector.test_multiple_select_with_more_spacecCs|jddgdS)Nzx, xr)r)rrrrtest_multiple_select_duplicatedsz0TestSoupSelector.test_multiple_select_duplicatedcCs|jdddgdS)Nzx, y ~ p[lang=fr]rzlang-fr)r)rrrrtest_multiple_select_siblingsz-TestSoupSelector.test_multiple_select_siblingcCs|jdddgdS)Nzx, y > zrzidb)r)rrrr.test_multiple_select_tag_and_direct_descendantsz?TestSoupSelector.test_multiple_select_tag_and_direct_descendantcCs|jdddddddgdS)Nz div > x, y, zrrzidarzidabzidac)r)rrrr/test_multiple_select_direct_descendant_and_tags sz@TestSoupSelector.test_multiple_select_direct_descendant_and_tagscCs|jdddddddgdS)Nz div x,y, zrrrrr r )r)rrrr(test_multiple_select_indirect_descendant sz9TestSoupSelector.test_multiple_select_indirect_descendantcCs(|jt|jjd|jt|jjddS)Nz,x, yzx,,y)rrr$r)rrrrtest_invalid_multiple_selectsz-TestSoupSelector.test_invalid_multiple_selectcCs|jdddgdS)Nzp[lang=en], p[lang=en-gb]zlang-enz lang-en-gb)r)rrrrtest_multiple_select_attrssz+TestSoupSelector.test_multiple_select_attrscCs|jddddgdS)Nz*x, y > z[id=zida], z[id=zidab], z[id=zidb]rrr )r)rrrrtest_multiple_select_idssz)TestSoupSelector.test_multiple_select_idscCs|jdddgdS)Nzbody > div > x, y > zrr)r)rrrrtest_multiple_select_nestedsz,TestSoupSelector.test_multiple_select_nestedcCsFd}t|d}|jd}|jdt|x|jddgdD]}q:WdS)Nz3
        z html.parserz.c1, .c2r9Zc1Zc2)ro)rrrr,r-)rrr$rrrrrtest_select_duplicate_elementss   z/TestSoupSelector.test_select_duplicate_elementsN)>rrr rrRrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr r r rrrrrrrrrsv1     r)2r1ZpdbrrXrRr6r0Zbs4rZ bs4.builderrrZ bs4.elementrrrr r r r r Z bs4.testingrrrrZ LXML_PRESENTrr!r2rLrPrdrrrrrrrrrrr,rFrQrfrrrrrrr sN  ( ;O83(3--*6n+a4*PK!0iWXX*tests/__pycache__/test_tree.cpython-36.pycnu[3 6]8@sdZddlmZddlZddlZddlZddlZddlmZddl m Z m Z ddl m Z mZmZmZmZmZmZmZddlmZmZe jddk Ze jd dk ZGd d d eZGd d d eZGdddeZGdddeZGdddeZGdddeZ GdddeZ!GdddeZ"GdddeZ#Gddde#Z$Gddde#Z%Gd d!d!eZ&Gd"d#d#e&Z'Gd$d%d%e&Z(Gd&d'd'eZ)Gd(d)d)eZ*Gd*d+d+eZ+Gd,d-d-eZ,Gd.d/d/eZ-Gd0d1d1eZ.Gd2d3d3eZ/Gd4d5d5eZ0Gd6d7d7eZ1dS)8a8Tests for Beautiful Soup's tree traversal methods. The tree traversal methods are the main advantage of using Beautiful Soup over just using a parser. Different parsers will build different Beautiful Soup trees given the same markup, but all Beautiful Soup trees can be traversed with the methods tested here. ) set_traceN) BeautifulSoup)builder_registryHTMLParserTreeBuilder)PY3KCDataComment DeclarationDoctypeNavigableString SoupStrainerTag)SoupTestskipIfZxmlZlxmlc@seZdZddZddZdS)TreeTestcCs|jdd|D|dS)zMake sure that the given tags have the correct text. This is used in tests that define a bunch of tags, each containing a single string, and then select certain strings by some mechanism. cSsg|] }|jqS)string).0tagrr/usr/lib/python3.6/test_tree.py 2sz*TreeTest.assertSelects..N) assertEqual)selftags should_matchrrr assertSelects+szTreeTest.assertSelectscCs|jdd|D|dS)zMake sure that the given tags have the correct IDs. This is used in tests that define a bunch of tags, each containing a single string, and then select certain strings by some mechanism. cSsg|] }|dqS)idr)rrrrrr;sz-TreeTest.assertSelectsIDs..N)r)rrrrrrassertSelectsIDs4szTreeTest.assertSelectsIDsN)__name__ __module__ __qualname__rrrrrrr)s rc@s8eZdZdZddZddZddZdd Zd d Zd S) TestFindzBasic tests of the find() method. find() just calls find_all() with limit=1, so it's not tested all that thouroughly here. cCs"|jd}|j|jdjddS)Nz 1234b2)souprfindr)rr$rrr test_find_tagEs zTestFind.test_find_tagcCs"|jd}|j|jddddS)Nu

        Räksmörgås

        u Räksmörgås)r)r$rr%)rr$rrrtest_unicode_text_findIs zTestFind.test_unicode_text_findcCs,|jd}t||jd|jddjdS)Nu&

        here it is

        z here it isu Räksmörgås)r)r$strrr%text)rr$rrrtest_unicode_attribute_findMs z$TestFind.test_unicode_attribute_findcCs"|jd}|jdt|jdS)z)Test an optimization that finds all tags.zfoobarN)r$rlenfind_all)rr$rrrtest_find_everythingSs zTestFind.test_find_everythingcCs$|jd}|jdt|jddS)z;Test an optimization that finds all tags with a given name.zfoobarbazr+aN)r$rr,r-)rr$rrrtest_find_everything_with_nameXs z'TestFind.test_find_everything_with_nameN) rrr __doc__r&r'r*r.r0rrrrr!>s r!c@s8eZdZdZddZddZddZdd Zd d Zd S) TestFindAllz%Basic tests of the find_all() method.cCs|jd}|j|jdddg|j|jdddg|j|jddgdddg|j|jtjdddddg|j|jdddddgd S) z'You can search the tree for text nodes.uFoobar»bar)r)r)Fooz.*»TN)r$rr-recompile)rr$rrrtest_find_all_text_nodes`s  z$TestFindAll.test_find_all_text_nodescCs|jd}|j|jddddddg|j|jddddg|j|jdd ddddd d g|j|jdd ddddd d gd S)z7You can limit the number of items returned by find_all.z(12345r/)limit1r#3 45rN)r$rr-)rr$rrrtest_find_all_limitps zTestFindAll.test_find_all_limitcCs:|jd}|j|ddddg|j|jdddgdS) Nz!123r/r=)r:r;foo)rr<)r$rr")rr$rrr%test_calling_a_tag_is_calling_findall|s z1TestFindAll.test_calling_a_tag_is_calling_findallcCs.|jd}g}|j||jg|j|dS)Nz)r$appendrr-)rr$lrrrTtest_find_all_with_self_referential_data_structure_does_not_cause_infinite_recursions  z`TestFindAll.test_find_all_with_self_referential_data_structure_does_not_cause_infinite_recursioncCs^|jd}|jd}|jt|d|jd}|jt|d|jdd}|jt|ddS)z%All find_all calls return a ResultSetzr/sourceTrB)r)N)r$r- assertTruehasattr)rr$resultrrrtest_find_all_resultsets    z#TestFindAll.test_find_all_resultsetN) rrr r1r8rArCrFrKrrrrr2]s   r2c@seZdZddZdS)TestFindAllBasicNamespacescCs<|jd}|jd|jdj|jd|jddidjdS)Nz04r?z mathml:msqrtr/zsvg:fillZred)attrs)r$rr%rname)rr$rrrtest_find_by_namespaced_names z7TestFindAllBasicNamespaces.test_find_by_namespaced_nameN)rrr rOrrrrrLsrLcspeZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZZS)TestFindAllByNamez&Test ways of finding tags by tag name.cstt|j|jd|_dS)NzFirst tag. Second tag. Third Nested tag. tag.)superrsetUpr$tree)r) __class__rrrRszTestFindAllByName.setUpcCs|j|jjdddgdS)Nr/z First tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_namesz+TestFindAllByName.test_find_all_by_tag_namecCs\|j|jjddddg|j|jjdddddg|j|jjdtjddddgdS)Nr/z First tag.)r)Tz Nested tag.r)rrSr-r6r7)rrrrtest_find_all_by_name_and_textsz0TestFindAllByName.test_find_all_by_name_and_textcCs|j|jjjddgdS)Nr/z Nested tag.)rrScr-)rrrr!test_find_all_on_non_root_elementsz3TestFindAllByName.test_find_all_on_non_root_elementcCs|j|jdddgdS)Nr/z First tag.z Nested tag.)rrS)rrrr%test_calling_element_invokes_find_allsz7TestFindAllByName.test_calling_element_invokes_find_allcCs |j|jjtdddgdS)Nr/z First tag.z Nested tag.)rrSr-r )rrrrtest_find_all_by_tag_strainersz/TestFindAllByName.test_find_all_by_tag_strainercCs"|j|jjddgdddgdS)Nr/r"z First tag.z Second tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_namessz,TestFindAllByName.test_find_all_by_tag_namescCs$|j|jjddddddgdS)NT)r/r"z First tag.z Second tag.z Nested tag.)rrSr-)rrrrtest_find_all_by_tag_dictsz+TestFindAllByName.test_find_all_by_tag_dictcCs$|j|jjtjddddgdS)Nz^[ab]$z First tag.z Second tag.z Nested tag.)rrSr-r6r7)rrrrtest_find_all_by_tag_resz)TestFindAllByName.test_find_all_by_tag_recCs,dd}|jd}|j|j|ddgdS)NcSs|j|jdkS)Nr)rNget)rrrrid_matches_nameszRTestFindAllByName.test_find_all_with_tags_matching_method..id_matches_namezMatch 1. Does not match. Match 2.zMatch 1.zMatch 2.)r$rr-)rr_rSrrr'test_find_all_with_tags_matching_methods z9TestFindAllByName.test_find_all_with_tags_matching_methodcCsx|jd}|jdd}|jdtjd}|jdddg\}}|jd|j|jd|j|jd|j|jd|jdS)NzH
        1
        2
        3
        divza dza br<r;)r$r%r6r7r-rr)rr$Zr1Zr2Zr3Zr4rrr%test_find_with_multi_valued_attributes z7TestFindAllByName.test_find_with_multi_valued_attribute)rrr r1rRrUrVrXrYrZr[r\r]r`rb __classcell__rr)rTrrPs   rPc@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!S)"TestFindAllByAttributecCs&|jd}|j|jddddgdS)Nz Matching a. Non-matching Matching b.a. first)rz Matching a.z Matching b.)r$rr-)rrSrrrtest_find_all_by_attribute_namesz6TestFindAllByAttribute.test_find_all_by_attribute_namecCstdjd}djd}|j|}|j|jg|j|d|j|jg|j|jdd|j|jg|j|dgddS)Nuםולשutf8u)titlezsomething else)encoder$rr/r-decode)rZpeacedatar$rrr%test_find_all_by_utf8_attribute_values    zName match. Class match. Non-match. A tag called 'name1'. Zname1)rNzA tag called 'name1'.rN)rMz Name match.classZclass2z Class match.)r$rr-)rrSrrrtest_find_all_by_attribute_dictsz6TestFindAllByAttribute.test_find_all_by_attribute_dictcCs|jd}|j|jddddg|j|jddddg|j|jdd ddg|j|jdddg|j|jdd ddg|j|jdddg|j|jdd dgdS) Nz Class 1. Class 2. Class 1. Class 3 and 4. r/r;)class_zClass 1.rWr<zClass 3 and 4.r?)rM)r$rr-)rrSrrrtest_find_all_by_classsz-TestFindAllByAttribute.test_find_all_by_classcCst|jd}|jdtjdd}|j|dg|jdtjdd}|j|dg|jdtjdd}|j|dgdS)Nz#Found itZgaro)rozFound itr/zo b)r$r-r6r7r)rrSfrrr0test_find_by_class_when_multiple_classes_present-s zGTestFindAllByAttribute.test_find_by_class_when_multiple_classes_presentcCsd|jd}|j|jdtjddgdd}|j|jd|gdd}|j|jd|dgdS) NzFound itr/ZbazFound itcSs t|dkS)Nr9)r,)valuerrrbig_attribute_value@sznTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_class..big_attribute_valuecSs t|dkS)Nr9)r,)rtrrrsmall_attribute_valueEszpTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_class..small_attribute_value)r$rr-r6r7)rr$rurvrrr:test_find_all_with_non_dictionary_for_attrs_finds_by_class;s zQTestFindAllByAttribute.test_find_all_with_non_dictionary_for_attrs_finds_by_classcCs|jd}|jd\}}|j||g|jdd|j|g|jdd|j|g|jddd|j|g|jdd|jg|jdddS)Nz*r/rBr3zfoo bar)rozbar foo)r$r-r)rr$r/Za2rrr:test_find_all_with_string_for_attrs_finds_multiple_classesKs zQTestFindAllByAttribute.test_find_all_with_string_for_attrs_finds_multiple_classescCs0|jd}tddid}|j|j|dgdS)Nzi Match. Non-match.rre)rMzMatch.)r$r rr-)rrSstrainerrrr'test_find_all_by_attribute_soupstrainerWsz>TestFindAllByAttribute.test_find_all_by_attribute_soupstrainercCs&|jd}|j|jddddgdS)NzID present. No ID present. ID is empty.r/)rzNo ID present.)r$rr-)rrSrrr$test_find_all_with_missing_attribute_sz;TestFindAllByAttribute.test_find_all_with_missing_attributecCs&|jd}|j|jddddgdS)NzID present. No ID present. ID is empty.T)rz ID present.z ID is empty.)r$rr-)rrSrrr$test_find_all_with_defined_attributegsz;TestFindAllByAttribute.test_find_all_with_defined_attributecCs>|jd}ddg}|j|jdd||j|jdd|dS)Nz[Unquoted attribute. Quoted attribute.zUnquoted attribute.zQuoted attribute.r=)rr;)r$rr-)rrSZexpectedrrr$test_find_all_with_numeric_attributeps z;TestFindAllByAttribute.test_find_all_with_numeric_attributecCs,|jd}|j|jdddgdddgdS)Nz1 2 3 No ID.r;r<r?)r)r$rr-)rrSrrr(test_find_all_with_list_attribute_valuesysz?TestFindAllByAttribute.test_find_all_with_list_attribute_valuescCs,|jd}|j|jtjddddgdS)NzOne a. Two as. Mixed as and bs. One b. No ID.z^a+$)rzOne a.zTwo as.)r$rr-r6r7)rrSrrr5test_find_all_with_regular_expression_attribute_valueszLTestFindAllByAttribute.test_find_all_with_regular_expression_attribute_valuecCsX|jd}|j}|j|g|jddd|jg|jddd|jg|jddddS)Nzfoobarfoor/rB)r)r3)r$r/rr-)rr$r/rrr'test_find_by_name_and_containing_strings  z>TestFindAllByAttribute.test_find_by_name_and_containing_stringcCs*|jd}|j|jd|jddddS)Nz"foofoor/rB)r))r$rr-)rr$rrr=test_find_by_name_and_containing_string_when_string_is_burieds zTTestFindAllByAttribute.test_find_by_name_and_containing_string_when_string_is_buriedcCsB|jd}|j}|j|g|jddd|jg|jddddS)Nz"foofoor+rB)rr)r=r3)r$r/rr-)rr$r/rrr,test_find_by_attribute_and_containing_strings zCTestFindAllByAttribute.test_find_by_attribute_and_containing_stringN)rrr rfrlrnrprsrwrxrzr{r|r}r~rrrrrrrrrds       rdc@seZdZdZddZdS) TestIndexzTest Tag.indexcCsN|jd}|j}x(t|jD]\}}|j||j|qW|jt|jddS)Nah
        Identical Not identical Identical Identical with child Also not identical Identical with child
        r=)r$ra enumeratecontentsrindex assertRaises ValueError)rrSraielementrrr test_indexs zTestIndex.test_indexN)rrr r1rrrrrrsrcs`eZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ Z S)TestParentOperationsz;Test navigation and searching through an element's parents.cs(tt|j|jd|_|jj|_dS)Na1
                Start here
            )rQrrRr$rSr"start)r)rTrrrRszTestParentOperations.setUpcCsF|j|jjdd|j|jjjdd|j|jjjjdddS)Nrbottommiddletop)rrparent)rrrr test_parentsz TestParentOperations.test_parentcCs |jjd}|j|j|jdS)Nr)rSrrr)rZtop_tagrrr%test_parent_of_top_tag_is_soup_objects z:TestParentOperations.test_parent_of_top_tag_is_soup_objectcCs|jd|jjdS)N)rrSr)rrrrtest_soup_object_has_no_parentsz3TestParentOperations.test_soup_object_has_no_parentcCs8|j|jjddddg|j|jjddddgdS)Nulrrr)r)rrZ find_parents)rrrrtest_find_parentssz&TestParentOperations.test_find_parentscCs8|j|jjddd|j|jjddddddS)Nrrrr)r)rr find_parent)rrrrtest_find_parentsz%TestParentOperations.test_find_parentcCs"|jjdd}|j|jjddS)Nz Start here)r)r")rSr%rrrN)rr)rrrtest_parent_of_text_elementsz0TestParentOperations.test_parent_of_text_elementcCs(|jjdd}|j|jddddS)Nz Start here)r)rrr)rSr%rr)rr)rrrtest_text_element_find_parentsz2TestParentOperations.test_text_element_find_parentcCs(dd|jjD}|j|dddgdS)NcSs&g|]}|dk rd|jkr|dqS)Nr)rM)rrrrrrsz>TestParentOperations.test_parent_generator..rrr)rparentsr)rrrrrtest_parent_generatorsz*TestParentOperations.test_parent_generator)rrr r1rRrrrrrrrrrcrr)rTrrs rcseZdZfddZZS) ProximityTestcstt|j|jd|_dS)NzgOneTwoThree)rQrrRr$rS)r)rTrrrRszProximityTest.setUp)rrr rRrcrr)rTrrsrcsTeZdZfddZddZddZddZd d Zd d Zd dZ ddZ Z S)TestNextOperationscstt|j|jj|_dS)N)rQrrRrSr"r)r)rTrrrRszTestNextOperations.setUpcCs*|j|jjd|j|jjjdddS)NOnerr#)rr next_element)rrrr test_nextszTestNextOperations.test_nextcCs |jjdd}|j|jddS)NThree)r))rSr%rr)rZlastrrrtest_next_of_last_item_is_nonesz1TestNextOperations.test_next_of_last_item_is_nonecCs|j|jjddS)N)rrSr)rrrrtest_next_of_root_is_nonesz,TestNextOperations.test_next_of_root_is_nonecCsB|j|jjdddg|jjdd|j|jjdddgdS)Nr"Tworr9)r)rr find_all_next)rrrrtest_find_all_nextsz%TestNextOperations.test_find_all_nextcCs2|j|jjddd|j|jjddddS)Nr"rr#r)r))rr find_next)rrrrtest_find_next sz!TestNextOperations.test_find_nextcCs<|jjdd}|j|jdjd|j|jdddgdS)Nr)r)r"rr)rSr%rrrrr)rr)rrrtest_find_next_for_text_elementsz2TestNextOperations.test_find_next_for_text_elementcCsF|jjdd}dd|jD}|\}}|j|dd|j|ddS)Nr)r)cSsg|]}|qSrr)rnoderrrrsz:TestNextOperations.test_next_generator..rr<r)rSr%Z next_elementsr)rrZ successorsrrrrrtest_next_generators z&TestNextOperations.test_next_generator) rrr rRrrrrrrrrcrr)rTrrs rcsTeZdZfddZddZddZddZd d Zd d Zd dZ ddZ Z S)TestPreviousOperationscs"tt|j|jjdd|_dS)Nr)r))rQrrRrSr%end)r)rTrrrRszTestPreviousOperations.setUpcCs*|j|jjdd|j|jjjddS)Nrr<r)rrprevious_element)rrrr test_previous!sz$TestPreviousOperations.test_previouscCs|jjd}|j|jddS)Nhtml)rSr%rr)rrerrr#test_previous_of_first_item_is_none%s z:TestPreviousOperations.test_previous_of_first_item_is_nonecCsdS)Nr)rrrrtest_previous_of_root_is_none)sz4TestPreviousOperations.test_previous_of_root_is_nonecCs6|j|jjddddg|j|jjdddgdS)Nr"rrrr=)r)rrfind_all_previous)rrrrtest_find_all_previous/sz-TestPreviousOperations.test_find_all_previouscCs2|j|jjddd|j|jjddddS)Nr"rr<r)r))rr find_previous)rrrrtest_find_previous7sz)TestPreviousOperations.test_find_previouscCs>|jjdd}|j|jdjd|j|jddddgdS)Nr)r)r"rr)rSr%rrrrr)rr)rrr#test_find_previous_for_text_element;sz:TestPreviousOperations.test_find_previous_for_text_elementcCsh|jjdd}dd|jD}|\}}}}|j|dd|j|jd|j|jd|j|jd dS) Nr)r)cSsg|]}|qSrr)rrrrrrCszBTestPreviousOperations.test_previous_generator..rr;bodyheadr)rSr%Zprevious_elementsrrN)rrZ predecessorsr"rrrrrrtest_previous_generatorAs z.TestPreviousOperations.test_previous_generator) rrr rRrrrrrrrrcrr)rTrrs rcseZdZfddZZS) SiblingTestcs4tt|jd}tjdjd|}|j||_dS)Na z\n\s*)rQrrRr6r7subr$rS)rmarkup)rTrrrRPs zSiblingTest.setUp)rrr rRrcrr)rTrrNsrcsLeZdZfddZddZddZddZd d Zd d Zd dZ Z S)TestNextSiblingcs"tt|j|jjdd|_dS)Nr;)r)rQrrRrSr%r)r)rTrrrRfszTestNextSibling.setUpcCs|j|jjddS)N)rrS next_sibling)rrrr!test_next_sibling_of_root_is_nonejsz1TestNextSibling.test_next_sibling_of_root_is_nonecCsB|j|jjdd|j|jjjdd|j|jjdddS)Nrr#r<z1.1)rrrr)rrrrtest_next_siblingmsz!TestNextSibling.test_next_siblingcCsN|j|jjjd|jjdd}|j|jd|jjdd}|j|jddS)Nz1.1)rr?)rrSrrr%)r nested_spanZ last_spanrrrtest_next_sibling_may_not_existts z/TestNextSibling.test_next_sibling_may_not_existcCs|j|jjddddS)Nspanrr#)rrfind_next_sibling)rrrrtest_find_next_sibling}sz&TestNextSibling.test_find_next_siblingcCs6|j|jjddddg|j|jjdddgdS)Nrr#r<r?)r)rrfind_next_siblings)rrrrtest_next_siblingss z"TestNextSibling.test_next_siblingscCsv|jd}|jdd}|j|jjd|j|jjd|j|jddg|j|jddd|j|jddddS)NzFoobarbazr4)r)r"bazr3nonesuch)r$r%rrrNrrr)rr$rrrr"test_next_sibling_for_text_elements  z2TestNextSibling.test_next_sibling_for_text_element) rrr rRrrrrrrrcrr)rTrrds  rcsLeZdZfddZddZddZddZd d Zd d Zd dZ Z S)TestPreviousSiblingcs"tt|j|jjdd|_dS)Nr?)r)rQrrRrSr%r)r)rTrrrRszTestPreviousSibling.setUpcCs|j|jjddS)N)rrSprevious_sibling)rrrr%test_previous_sibling_of_root_is_nonesz9TestPreviousSibling.test_previous_sibling_of_root_is_nonecCsB|j|jjdd|j|jjjdd|j|jjdddS)Nrr<r#z3.1)rrrr)rrrrtest_previous_siblingsz)TestPreviousSibling.test_previous_siblingcCsN|j|jjjd|jjdd}|j|jd|jjdd}|j|jddS)Nz1.1)rr;)rrSrrr%)rrZ first_spanrrr#test_previous_sibling_may_not_exists z7TestPreviousSibling.test_previous_sibling_may_not_existcCs|j|jjddddS)Nrrr<)rrfind_previous_sibling)rrrrtest_find_previous_siblingsz.TestPreviousSibling.test_find_previous_siblingcCs6|j|jjddddg|j|jjdddgdS)Nrr<r#r;)r)rrfind_previous_siblings)rrrrtest_previous_siblingss z*TestPreviousSibling.test_previous_siblingscCsv|jd}|jdd}|j|jjd|j|jjd|j|jddg|j|jddd|j|jddddS)NzFoobarbazr)r)r"r4r3r)r$r%rrrNrrr)rr$rrrr&test_previous_sibling_for_text_elements  z:TestPreviousSibling.test_previous_sibling_for_text_element) rrr rRrrrrrrrcrr)rTrrs  rc@s0eZdZdZddZddZddZdd Zd S) TestTagCreationz$Test the ability to create new tags.cCsd|jd}|jddddid}|jt|t|jd|j|jtddd|j|jd|j dS)NrrBrrNza name)r3rM)r3rN) r$new_tagrH isinstancer rrNdictrMr)rr$rrrr test_new_tags  zTestTagCreation.test_new_tagcCstrBtdd}|jd}|jd}|jd|j|jd|jtdd}|jd}|jd}|jd|j|jd|jdS) Nrzlxml-xmlbrps
            s

            z html.parsers

            )XML_BUILDER_PRESENTrrrri)rZxml_soupZxml_brZxml_pZ html_soupZhtml_brZhtml_prrr1test_tag_inherits_self_closing_rules_from_builders      zATestTagCreation.test_tag_inherits_self_closing_rules_from_buildercCs4|jd}|jd}|jd||jt|tdS)NrrB)r$ new_stringrrHrr )rr$srrr'test_new_string_creates_navigablestrings   z7TestTagCreation.test_new_string_creates_navigablestringcCs6|jd}|jdt}|jd||jt|tdS)NrrB)r$rrrrHr)rr$rrrr3test_new_string_can_create_navigablestring_subclasss   zCTestTagCreation.test_new_string_can_create_navigablestring_subclassN)rrr r1rrrrrrrrrs rc@s<eZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!d"Zd#d$Zd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zd/d0Zd1d2Zd3d4Zd5d6Zd7d8Zd9d:Zd;d<Z d=d>Z!d?d@Z"dAdBZ#dCdDZ$dEdFZ%dGdHZ&dIdJZ'dKdLZ(dMS)NTestTreeModificationcCsl|jd}d|jd<|j|j|jd|jd=|j|j|jdd|jd<|j|j|jddS) Nzr+rzzrBZid2z)r$r/rrj document_for)rr$rrrtest_attribute_modifications   z0TestTreeModification.test_attribute_modificationcCsltjd}|jd|d}t||d}t||d}d|d<|jjd||jjd ||j|jjd dS) Nrz )builderr/olzhttp://foo.com/Zhrefrr=s4
              )rlookupr$r rinsertrri)rrr$r/rrrrtest_new_tag_creations   z*TestTreeModification.test_new_tag_creationcCs\d}|j|}|jdd}|j}|jddj|j|j|j||j|j|jddS)NzT

              Don't leave me here.

              Don't leave!

              r#)rzD

              Don't leave me .

              Don't leave!here

              )r$r%r"rDrrrjr)rdocr$Z second_paraZboldrrr!test_append_to_contents_moves_tags   z6TestTreeModification.test_append_to_contents_moves_tagcCs0d}|j|}|j}|j|j}|j||dS)Nz)r$r/ replace_withrWr)rr)r$r/new_arrr1test_replace_with_returns_thing_that_was_replaceds   zFTestTreeModification.test_replace_with_returns_thing_that_was_replacedcCs,d}|j|}|j}|j}|j||dS)Nz)r$r/unwrapr)rr)r$r/rrrr+test_unwrap_returns_thing_that_was_replaceds  z@TestTreeModification.test_unwrap_returns_thing_that_was_replacedcCsJ|jd}|j}|j|jd|j|jt|j|jt|j|j dS)NzFooBar) r$r/extractrrrrrrrW)rr$r/rrrItest_replace_with_and_unwrap_give_useful_exception_when_tag_has_no_parents  z^TestTreeModification.test_replace_with_and_unwrap_give_useful_exception_when_tag_has_no_parentcCs:d}|j|}|j}|jj||j|j|j|dS)Nz-Foo)r$rWrrrjr)rr)r$rWrrrtest_replace_tag_with_itself's   z1TestTreeModification.test_replace_tag_with_itselfcCs&d}|j|}|jt|jj|jdS)Nz)r$rrr"rr/)rr)r$rrr1test_replace_tag_with_its_parent_raises_exception.s zFTestTreeModification.test_replace_tag_with_its_parent_raises_exceptioncCs(d}|j|}|jt|jjd|jdS)Nzr)r$rrr/r)rr)r$rrr,test_insert_tag_into_itself_raises_exception3s zATestTreeModification.test_insert_tag_into_itself_raises_exceptionc Cs|jd}d}|j|}|jd|x|jD]}t|t s,tq,Wt|j\}}}}|jd|j |jd|j |jd|j |jd|j dS) zInserting one BeautifulSoup object into another actually inserts all of its children -- you'll never combine BeautifulSoup objects. z-

              And now, a word:

              And we're back.

              z

              p2

              p3

              r=zAnd now, a word:p2p3zAnd we're back.N) r$rZ descendantsrrAssertionErrorlistZchildrenrr) rr$r)Z to_insertrp1rrZp4rrr1test_insert_beautifulsoup_object_inserts_children8s    zFTestTreeModification.test_insert_beautifulsoup_object_inserts_childrencCsX|jd}|j}|jd}|jdd|j\}}|jd|jd|jd|jjdS)Nz

              onethree

              rr=ZtworZthree)r$r/rrZ replaceWithrr"r)rr$r/r"leftrightrrr3test_replace_with_maintains_next_element_throughoutLs      zHTestTreeModification.test_replace_with_maintains_next_element_throughoutcCsl|jd}|jddjd|jdd}|j}|j|j||j|j||j|jj||j|jddS)Nz Argh!zArgh!)r)zHooray!)r$r%rr"rrrr)rr$new_textr"rrrtest_replace_final_node[s  z,TestTreeModification.test_replace_final_nodecCs|jd}|jjdd|j|j|jd|jdd}|j|jd|j|jj||j|j d|j|j j ||j|j d|j|j|j dS)NzArgh!r=zHooray!z!Argh!Hooray!)r)zArgh!) r$r"rrrjrr%rrrrrW)rr$rrrrtest_consecutive_text_nodeses   z0TestTreeModification.test_consecutive_text_nodescCsT|jd}|jjdd|jjdd|jddg|jj|j|jjdjddS)Nzrr3rB)r$r/rrrr)rr$rrrtest_insert_stringzs  z'TestTreeModification.test_insert_stringcCs|j}|jd|d}t||d}|jdd|jjd||j|j|jd|j}|j|j ||j|j ||j dd }|j|j ||j|j ||j}|j|j ||j|j ||j dd }|j|j||j|j ||j|j |dS) Nz%Findlady!)rZmagictagrther=z=Findthelady!ZFind)r))Zdefault_builderr$r rr/rrjrr"rrr%rrrWr)rrr$Z magic_tagZb_tagr%Zc_tagrrrrtest_insert_tags,      z$TestTreeModification.test_insert_tagcCs0d}|j|}|jj|j|j||jdS)Nz)r$r/rDr"rrj)rrkr$rrr*test_append_child_thats_already_at_the_ends z?TestTreeModification.test_append_child_thats_already_at_the_endcCs2d}|j|}|jjd|j|jd|jdS)Nzrz)r$r/rdrrj)rrkr$rrr$test_move_tag_to_beginning_of_parents z9TestTreeModification.test_move_tag_to_beginning_of_parentcCs.|jd}|jjdd|jt|jddS)Nz
              r=ZContentsz
              Contents
              )r$rrrr()rr$rrr&test_insert_works_on_empty_element_tags z;TestTreeModification.test_insert_works_on_empty_element_tagcCs`|jd}|jjd|jjd|j|j|jd|jj|j|j|j|jddS)NzfoobarBAZQUUXzQUUXfooBAZbarzQUUXbarfooBAZ)r$r" insert_beforer/rrjr)rr$rrrtest_insert_befores   z'TestTreeModification.test_insert_beforecCs`|jd}|jjd|jjd|j|j|jd|jj|j|j|j|jddS)Nzfoobarr r zfooQUUXbarBAZzQUUXbarfooBAZ)r$r" insert_afterr/rrjr)rr$rrrtest_insert_afters   z&TestTreeModification.test_insert_aftercCsR|jd}|jd}|jd}|jt|j||jt|j||jt|j|dS)Nrr/)r$rrrrrNotImplementedError)rr$rrrrr:test_insert_after_raises_exception_if_after_has_no_meanings    zOTestTreeModification.test_insert_after_raises_exception_if_after_has_no_meaningcCsR|jd}|jd}|jd}|jt|j||jt|j||jt|j|dS)Nrr/)r$rrrrrr)rr$rrrrrFtest_insert_before_raises_notimplementederror_if_before_has_no_meanings    z[TestTreeModification.test_insert_before_raises_notimplementederror_if_before_has_no_meaningcCsv|jd}|jd\}}|j||j|j|jd|j|jd|j|j|j|j|jd|j|j ddS)Nz;

              There's no business like show business

              r"z0

              There's business like no business

              noz business) r$r-rrrjrrrrr)rr$rZshowrrrtest_replace_withs z&TestTreeModification.test_replace_withcCs0d}|j|}|jj|j|jd|jdS)Nzz)r$r"rrWrrj)rrkr$rrrtest_replace_first_childs z-TestTreeModification.test_replace_first_childcCs0d}|j|}|jj|j|jd|jdS)Nzz)r$rWrr"rrj)rrkr$rrrtest_replace_last_childs z,TestTreeModification.test_replace_last_childcCs|jd}|j}|j}|j||j|j|jd|j|jd|j|jddj d|j|j d|j|j d|j|j d|j|j|j |j|j d|j|j j |j|j|j d|jdd}|j}|j|j ||j|j ||j|j ||j|j |dS)NzQWereservetherighttorefuseservicez-Werefusetoservicer)r)ZWeZto)r$r"rrrrrjrrr%rrrrr/eg)rr$Z remove_tagZmove_tagZto_textZg_tagrrrtest_nested_tag_replace_withs.   z1TestTreeModification.test_nested_tag_replace_withcCs6|jd}|jj|j|jd|j|jjddS)NzI

              Unneeded formatting is unneeded

              zUnneeded formatting is unneeded)r$emrrrr))rrSrrr test_unwraps  z TestTreeModification.test_unwrapcCsF|jd}|jj|jd}|j|jd|j|j|jddS)NzI wish I was bold.r"zI wish I was bold.)r$rwraprrrjr)rr$rtrrr test_wrap"s  zTestTreeModification.test_wrapcCs4|jd}|jjj|j|j|j|jddS)NzI wish I was bold.zI wish I was bold.)r$r"rrrrjr)rr$rrr%test_wrap_extracts_tag_from_elsewhere)s z:TestTreeModification.test_wrap_extracts_tag_from_elsewherecCsH|jd}|jjj|j|jdt|jj|j|j|jddS)Nz+I like being bold.I wish I was bold.r+z+I like being bold.I wish I was bold.) r$r"rrrr,rrjr)rr$rrr&test_wrap_puts_new_contents_at_the_end/s   z;TestTreeModification.test_wrap_puts_new_contents_at_the_endcCs|jd}|jt|jjd|jddj}|j|jd|j|jd|jt|jjd|j|jd|j|j d|j|j j d|jdd }|jd d }|j|j ||j|j ||j|j ||j|j |dS) NzRSome content. More content.r9Znav)rz6Some content. More content.zr+zSome content. )r)z More content.) r$rr,rrr%rrjrrrrr)rr$Z extractedZ content_1Z content_2rrr test_extract7s"   z!TestTreeModification.test_extractcCsz|jd}|jj}|jj}|jd}|jd}|jj||jj||j|j|j||jj|j||jjdS)NzfoobarrBr3)r$r/rr"rrDrr)rr$Zfoo_1Zbar_1Zfoo_2Zbar_2rrr4test_extract_distinguishes_between_identical_stringsPs     zITestTreeModification.test_extract_distinguishes_between_identical_stringscs8|jdfddjdD|jdtjdS)Nzv csg|]}jjqSr)scriptr)rr)r$rrrmszKTestTreeModification.test_extract_multiples_of_same_tag..r$z )r$r-rr(r)rr)r$r"test_extract_multiples_of_same_tagas z7TestTreeModification.test_extract_multiples_of_same_tagcCs.|jd}|jdj|jd|jddS)Nz hi r)r$r%rr)rr$rrrBtest_extract_works_when_element_is_surrounded_by_identical_stringsqszWTestTreeModification.test_extract_works_when_element_is_surrounded_by_identical_stringscCsf|jd}|j}|jj|jt|jjd|jt|d|j }|jdd|jdt|jdS)z Tag.clear()z4

              String Italicized and another

              rrT)Z decomposeN) r$r/rclearrr,rrHrIr)rr$r/rrrr test_clearzs   zTestTreeModification.test_clearcCsB|jd}d|j_|j|jjdgd|j_|j|jjdgdS)zTag.string = 'string'z rBr3N)r$r/rrrr")rr$rrrtest_string_sets  z$TestTreeModification.test_string_setcCs,|jd}|jj|j_|j|jjddS)Nzfoobarsbarbar)r$rWrr"rr/ri)rr$rrr/test_string_set_does_not_affect_original_strings  zDTestTreeModification.test_string_set_does_not_affect_original_stringcCs2|jd}td}||j_|jt|jjtdS)NzrB)r$rr/rrHr)rr$cdatarrr)test_set_string_preserves_class_of_strings z>TestTreeModification.test_set_string_preserves_class_of_stringN))rrr rrrrrrrrrrrrrrrrr r rrrrrrrrrrr r!r"r#r%r&r(r)r*r,rrrrrsL       $ rc@sxeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ ddZ ddZddZddZdS)TestElementObjectsz)Test various features of element objects.cCsV|jd}|jt|jd|jt|d|jt|jd|jt|jjddS)z3The length of an element is its number of children.z123r=r9N)r$rr,rr)rr$rrrtest_lens  zTestElementObjects.test_lencCsL|jd}|j|j|jd|j|jj|jdjd|j|jddS)z2Accessing a Python member .foo invokes find('foo')zr"rN)r$rr"r%rr/)rr$rrrtest_member_access_invokes_finds z2TestElementObjects.test_member_access_invokes_findc CsP|jd}tjdd }|j}WdQRX|j|j||jdt|djdS)NzT)recordzp.bTag is deprecated, use .find("b") instead. If you really were looking for a tag called bTag, use .find("bTag")r)r$warningscatch_warningsZbTagrr"r(message)rr$wrrrrtest_deprecated_member_accesss z0TestElementObjects.test_deprecated_member_accesscCs2|jd}|j|jjd|j|jjddS)zhas_attr() checks for the presence of an attribute. Please note note: has_attr() is different from __in__. has_attr() checks the tag's attributes and __in__ checks the tag's chidlren. zattrZattr2N)r$rHrBhas_attr assertFalse)rr$rrr test_has_attrs z TestElementObjects.test_has_attrcCsd}|j|ddS)Nz%z%)ZassertSoupEquals)rrrrr.test_attributes_come_out_in_alphabetical_orderszATestElementObjects.test_attributes_come_out_in_alphabetical_ordercCs|jd}|j|jjddS)Nz foorB)r$rr"r)rr$rrr test_strings zTestElementObjects.test_stringcCs|jd}|j|jjddS)Nz)r$rr"r)rr$rrrtest_empty_tag_has_no_strings z/TestElementObjects.test_empty_tag_has_no_stringcCs`|jd}|j|jjd|jd}|j|jjd|jd}|jjdd|j|jjddS)Nzfoo
              zfoobarz foor=r3)r$rr"rr/r)rr$rrr-test_tag_with_multiple_children_has_no_strings   z@TestElementObjects.test_tag_with_multiple_children_has_no_stringcCs,|jd}|j|jjd|j|jddS)NzfoorB)r$rr/r)rr$rrr)test_tag_with_recursive_string_has_strings zfeozN)r$r8r"r)rr$rrrtest_lack_of_strings  z&TestElementObjects.test_lack_of_stringcCs`|jd}|j|jjd|j|jjddd|j|jjdd|j|jjddddd S) zBTag.text and Tag.get_text(sep=u"") -> all child text, concatenatedzar t zar t T)stripZart,z a,r, , t za,r,tN)r$rr/r)get_text)rr$rrr test_all_texts  z TestElementObjects.test_all_textcCsJ|jd}|j|jd|j|jttfdd|j|jddddS)NzfoobarZfoobar)typesZ fooIGNOREbar)r$rrBr r)rr$rrrtest_get_text_ignores_commentss  z1TestElementObjects.test_get_text_ignores_commentscCs$|jd}|jddgt|jdS)NzfoobarrBr3)r$rrZstrings)rr$rrr!test_all_strings_ignores_commentss z4TestElementObjects.test_all_strings_ignores_commentsN)rrr r1r.r/r5r9r:r;r<r=r>r?rCrErFrrrrr-s   r-c@sPeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ dS)TestCDAtaListAttributesz0Testing cdata-list attributes like 'class'. cCs"|jd}|jdg|jddS)NzrBrm)r$rr/)rr$rrrtest_single_value_becomes_list s z6TestCDAtaListAttributes.test_single_value_becomes_listcCs$|jd}|jddg|jddS)NzrBr3rm)r$rr/)rr$rrr!test_multiple_values_becomes_lists z9TestCDAtaListAttributes.test_multiple_values_becomes_listcCs&|jd}|jdddg|jddS)NzrBr3rrm)r$rr/)rr$rrr2test_multiple_values_separated_by_weird_whitespaces zJTestCDAtaListAttributes.test_multiple_values_separated_by_weird_whitespacecCs |jd}|jd|jjdS)Nzs)r$rr/ri)rr$rrr,test_attributes_joined_into_string_on_outputs zDTestCDAtaListAttributes.test_attributes_joined_into_string_on_outputcCs$|jd}|jdg|jjddS)Nzzabc defr)r$rr/Zget_attribute_list)rr$rrrtest_get_attribute_lists z/TestCDAtaListAttributes.test_get_attribute_listcCs$|jd}|jddg|jddS)Nz(z ISO-8859-1zUTF-8zaccept-charset)r$rZform)rr$rrrtest_accept_charset!s z+TestCDAtaListAttributes.test_accept_charsetcCs$d}|j|}|jd|jddS)Nz)zISO-8859-1 UTF-8zaccept-charset)r$rr/)rrkr$rrr-test_cdata_attribute_applying_only_to_one_tag%s zETestCDAtaListAttributes.test_cdata_attribute_applying_only_to_one_tagcs6|jdj|jdjfdd}|jt|dS)Nrcs d_dS)NrB)rNr)rrrt0szJTestCDAtaListAttributes.test_string_has_immutable_name_property..t)r$rrrNrAttributeError)rrOr)rr'test_string_has_immutable_name_property-s  z?TestCDAtaListAttributes.test_string_has_immutable_name_propertyN) rrr r1rHrIrJrKrLrMrNrQrrrrrG srGcs`eZdZdZfddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ Z S)TestPersistencez*Testing features like pickle and deepcopy.cs&tt|jd|_|j|j|_dS)Nay Beautiful Soup: We called him Tortoise because he taught us. foo bar )rQrRrRZpager$rS)r)rTrrrR7szTestPersistence.setUpcCs@tj|jd}tj|}|j|jt|j|j|jjdS)Nr+)pickledumpsrSloadsrrTrrj)rdumpedloadedrrr!test_pickle_and_unpickle_identityKs z1TestPersistence.test_pickle_and_unpickle_identitycCs&tj|j}|j|j|jjdS)N)copydeepcopyrSrrj)rZcopiedrrrtest_deepcopy_identitySs z&TestPersistence.test_deepcopy_identitycCs:tdd}|j}|j}|jdt||j||jdS)Ns

               

              z html.parseru

               

              )rZoriginal_encoding__copy__rr()rr$encodingrYrrrtest_copy_preserves_encodingXs  z,TestPersistence.test_copy_preserves_encodingcCs>d}|j|}tj|tj}tj|}|j|j|jdS)Nu )r$rSrTZHIGHEST_PROTOCOLrUrrj)rrr$rVrWrrrtest_unicode_pickle_s   z#TestPersistence.test_unicode_picklecCszd}|j|}|jdd}tj|}|j|||jd|j|jd|j|jd|j|jd|j|jd|jdS)NzFooBarr4)r) r$r%rYrrrassertNotEqualrr)rrr$s1s2rrr1test_copy_navigablestring_is_not_attached_to_treegs    zATestPersistence.test_copy_navigablestring_is_not_attached_to_treecCs>d}|j|}|j}tj|}|j|||jt|tdS)Nz)r$rrYrrHrr)rrr$rarbrrr0test_copy_navigablestring_subclass_has_same_typess    z@TestPersistence.test_copy_navigablestring_subclass_has_same_typecCs(d}|j|}tj|}|j||dS)Nz)
              FooBar
              end)r$rYr)rrr$Z soup_copyrrrtest_copy_entire_soup{s  z%TestPersistence.test_copy_entire_soupcCsd}|j|}|j}tj|}|jt|t||j|||j||k|jd|j|jd|j|jd|jddj |j d|jddj dS)Nz)
              FooBar
              endZBar)r) r$rarYrr(r8rrr%rr`)rrr$raZdiv_copyrrrtest_copy_tag_copies_contentss   z-TestPersistence.test_copy_tag_copies_contents)rrr r1rRrXr[r^r_rcrdrerfrcrr)rTrrR4s  rRc@seZdZddZddZddZddZd d Zd d Zd dZ ddZ ddZ ddZ ddZ ddZddZddZddZdd Zd!S)"TestSubstitutionscCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>minimal) formatter)r$rjrr)rrr$decodedrrr!test_default_formatter_is_minimals  z3TestSubstitutions.test_default_formatter_is_minimalcCs0d}|j|}|jdd}|j||jddS)Nu'
              <<Sacré bleu!>>r)riz.
              <<Sacré bleu!>>)r$rjrr)rrr$rjrrrtest_formatter_htmls   z%TestSubstitutions.test_formatter_htmlcCs0d}|j|}|jdd}|j||jddS)Nu'
              <<Sacré bleu!>>Zhtml5)riz-
              <<Sacré bleu!>>)r$rjrr)rrr$rjrrrtest_formatter_html5s   z&TestSubstitutions.test_formatter_html5cCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>rh)ri)r$rjrr)rrr$rjrrrtest_formatter_minimals  z(TestSubstitutions.test_formatter_minimalcCs0d}|j|}|jdd}|j||jddS)Nu#<<Sacré bleu!>>)riu<>)r$rjrr)rrr$rjrrrtest_formatter_nulls   z%TestSubstitutions.test_formatter_nullcCs4d}|j|}|jddd}|j||jddS)Nz!<foo>bar
              cSs|jS)N)upper)xrrrsz9TestSubstitutions.test_formatter_custom..)rizBAR
              )r$rjrr)rrr$rjrrrtest_formatter_customs  z'TestSubstitutions.test_formatter_customcCsd}|j|}|j}d}|j||j|j||jddd}|j||jdd|j||jddd}|j||jdd ddS) Nu%eu)erh)riz/eru%EcSs|jS)N)rp)rqrrrrrszMTestSubstitutions.test_formatter_is_run_on_attribute_values..)r$r/rrj)rrr$r/Zexpect_minimalZ expect_htmlZ expect_upperrrr)test_formatter_is_run_on_attribute_valuess z;TestSubstitutions.test_formatter_is_run_on_attribute_valuescCs$d}t|dj}|jd|kdS)NzO z html.parsers < < hey > >)rrirH)rrencodedrrr2test_formatter_skips_script_tag_for_html_documentsszDTestSubstitutions.test_formatter_skips_script_tag_for_html_documentscCs$d}t|dj}|jd|kdS)NzF z html.parsers < < hey > >)rrirH)rrrurrr1test_formatter_skips_style_tag_for_html_documentsszCTestSubstitutions.test_formatter_skips_style_tag_for_html_documentscCs |jd}|jd|jjdS)Nz*
              foo
                	bar
                
                
              baz z/
              foo
                	bar
                
                
              baz
              )r$rraprettify)rr$rrr,test_prettify_leaves_preformatted_text_alones z>TestSubstitutions.test_prettify_leaves_preformatted_text_alonecCs,tdd}|jddd}|jd|kdS)Nzfooz html.parsercSs|jS)N)rp)rqrrrrrszLTestSubstitutions.test_prettify_accepts_formatter_function..)riZFOO)rrxrH)rr$Zprettyrrr(test_prettify_accepts_formatter_functions z:TestSubstitutions.test_prettify_accepts_formatter_functioncCs"|jd}|jtt|jdS)Nz)r$rr(typerx)rr$rrr(test_prettify_outputs_unicode_by_defaults z:TestSubstitutions.test_prettify_outputs_unicode_by_defaultcCs$|jd}|jtt|jddS)Nzzutf-8)r$rbytesr{rx)rr$rrrtest_prettify_can_encode_datas z/TestSubstitutions.test_prettify_can_encode_datacCs0d}|j|}|jjd}|j||jddS)NuSacré bleu!zutf-8)r$r"rir)rrr$rurrr,test_html_entity_substitution_off_by_defaults  z>TestSubstitutions.test_html_entity_substitution_off_by_defaultcCsd}|j|}|j|jdd|jd}|jd|k|jd}|jd|k|jd}|jd |k|jd jd }|jd |kdS) NzEZcontentztext/html; charset=x-sjiszutf-8s charset=utf-8euc_jpscharset=euc_jpz shift-jisscharset=shift-jiszutf-16zcharset=utf-16)r$rmetarirHrj)rZmeta_tagr$utf_8r shift_jisZutf_16_urrrtest_encoding_substitution s    z,TestSubstitutions.test_encoding_substitutioncCs2d}td}|j||d}|j|jdjddS)Nz`
              foo
              Zpre)Z parse_onlyr)r r$rrrN)rrryr$rrr;test_encoding_substitution_doesnt_happen_if_tag_is_strained$szMTestSubstitutions.test_encoding_substitution_doesnt_happen_if_tag_is_strainedN)rrr rkrlrmrnrorsrtrvrwryrzr|r~rrrrrrrrgs       rgc@sPeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ dS) TestEncodingz0Test the ability to encode objects into strings.cCs.d}|j|}|j|jjjddjddS)Nu zutf-8u☃)r$rr"rri)rrr$rrr"test_unicode_string_can_be_encoded2s z/TestEncoding.test_unicode_string_can_be_encodedcCs,d}|j|}|j|jjd|jddS)Nu zutf-8)r$rr"ri)rrr$rrr1test_tag_containing_unicode_string_can_be_encoded8s z>TestEncoding.test_tag_containing_unicode_string_can_be_encodedcCs&d}|j|}|j|jjdddS)Nu asciis)r$rr"ri)rrr$rrrs zITestEncoding.test_encoding_substitutes_unrecognized_characters_by_defaultcCs&d}|j|}|jt|jddddS)Nu rstrict)errors)r$rUnicodeEncodeErrorri)rrr$rrr test_encoding_can_be_made_strictCs z-TestEncoding.test_encoding_can_be_made_strictcCs$d}|j|}|jd|jjdS)Nu u☃)r$rr"Zdecode_contents)rrr$rrrtest_decode_contentsIs z!TestEncoding.test_decode_contentscCs.d}|j|}|jdjd|jjdddS)Nu u☃rg)r])r$rrir"Zencode_contents)rrr$rrrtest_encode_contentsNs  z!TestEncoding.test_encode_contentscCs*d}|j|}|jdjd|jjdS)Nu u☃rg)r$rrir"ZrenderContents)rrr$rrrtest_deprecated_renderContentsUs z+TestEncoding.test_deprecated_renderContentscCs8d}|j|}tr$|j|t|n|jdt|dS)Nu s \u2603)r$rrrepr)rrr$rrr test_repr[s  zTestEncoding.test_reprN) rrr r1rrrrrrrrrrrrr/src@s,eZdZddZddZddZddZd S) TestNavigableStringSubclassescCsX|jd}td}|jd||jt|d|j|jddd|j|jdddS)NrrBr=z)r)r)r$rrrr(r%r)rr$r+rrr test_cdataes   z(TestNavigableStringSubclasses.test_cdatacsVd_fdd}jd}td}|jd|jd|j|djdjd S) zkText inside a CData object is passed into the formatter. But the return value is ignored. rcsjd7_dS)Nr=zBITTER FAILURE)count)args)rrr incrementvszNTestNavigableStringSubclasses.test_cdata_is_never_formatted..incrementrz<><><>r=s<><>]]>)riN)rr$rrrri)rrr$r+r)rrtest_cdata_is_never_formattedos   z;TestNavigableStringSubclasses.test_cdata_is_never_formattedcCs2td}|jd}|jd||j|jddS)NrBrr=s )r r$rrri)rZdoctyper$rrrtest_doctype_ends_in_newlines  z:TestNavigableStringSubclasses.test_doctype_ends_in_newlinecCstd}|jd|jdS)NrBz)r rZ output_ready)rr rrrtest_declarationsz.TestNavigableStringSubclasses.test_declarationN)rrr rrrrrrrrrcs rc@seZdZdZddZddZeZddZdd Zd d Z d d Z ddZ ddZ ddZ ddZddZddZddZddZddZd d!Zd"d#Zd$d%Zd&d'Zd(d)Zd*d+Zd,d-Zd.d/Zd0d1Zd2d3Zd4d5Zd6d7Zd8d9Z d:d;Z!dd?Z#d@dAZ$dBdCZ%dDdEZ&dFdGZ'dHdIZ(dJdKZ)dLdMZ*dNdOZ+dPdQZ,dRdSZ-dTdUZ.dVdWZ/dXdYZ0dZd[Z1d\d]Z2d^d_Z3d`daZ4dbdcZ5dddeZ6dfdgZ7dhdiZ8djdkZ9dldmZ:dndoZ;dpdqZ The title Hello there.

              An H1

              Some text

              Some more text

              An H2

              Another

              Bob

              Another H2

              me span1a1 span1a2 test span2a1

              English

              English UK

              English US

              French

              cCst|jd|_dS)Nz html.parser)rHTMLr$)rrrrrRszTestSoupSelector.setUpcKsRdd|jj|f|D}|j|j|j||d|dj|dj|fdS)NcSsg|] }|dqS)rr)relrrrrsz2TestSoupSelector.assertSelects..z$Selector %s, expected [%s], got [%s]z, )r$selectsortrjoin)rselector expected_idskwargsZel_idsrrrrs zTestSoupSelector.assertSelectscGs"x|D]\}}|j||qWdS)N) assertSelect)rZtestsrrrrrassertSelectMultiplesz%TestSoupSelector.assertSelectMultiplecCsF|jjd}|jt|d|j|djd|j|djdgdS)Nrhr=rz The title)r$rrr,rNr)relsrrrtest_one_tag_ones z!TestSoupSelector.test_one_tag_onecCsX|jjd}|jt|dx|D]}|j|jdq"W|jjd}|jd|ddS)Nramainr)r$rrr,rN select_one)rrrarrrrtest_one_tag_manys    z"TestSoupSelector.test_one_tag_manycCs|jjd}|jd|dS)NZnonexistenttag)r$rr)rmatchrrr(test_select_one_returns_none_if_no_matchs z9TestSoupSelector.test_select_one_returns_none_if_no_matchcCs |jjd}|jdddgdS)Nzdiv divinnerdata1)r$rr)rrrrrtest_tag_in_tag_ones z$TestSoupSelector.test_tag_in_tag_onecCs&x dD]}|j|ddddgqWdS) Nhtml div html body divbody divrrrfooter)rrr)r)rrrrrtest_tag_in_tag_manys z%TestSoupSelector.test_tag_in_tag_manycCsB|jddgdd|jdddgdd|jdd ddd gd ddS) Nzhtml divrr=)r:z html body divrr+zbody divrrr>)r)rrrr test_limitszTestSoupSelector.test_limitcCs|jt|jjdddS)Ndelr)rr,r$r)rrrrtest_tag_no_matchsz"TestSoupSelector.test_tag_no_matchcCs|jt|jjddS)Nztag%t)rrr$r)rrrrtest_invalid_tagsz!TestSoupSelector.test_invalid_tagcCs|jdddgdS)Nzcustom-dashed-tagdash1dash2)r)rrrrtest_select_dashed_tag_idssz+TestSoupSelector.test_select_dashed_tag_idscCs6|jjd}|j|djd|j|ddddS)Nzcustom-dashed-tag[id="dash2"]rzcustom-dashed-tagrr)r$rrrN)rZdashedrrrtest_select_dashed_by_ids z)TestSoupSelector.test_select_dashed_by_idcCs|j|jjddjddS)Nzbody > custom-dashed-tagrz Hello there.)rr$rr))rrrrtest_dashed_tag_textsz%TestSoupSelector.test_dashed_tag_textcCs |j|jjd|jjddS)Nzcustom-dashed-tag)rr$rr-)rrrr#test_select_dashed_matches_find_allsz4TestSoupSelector.test_select_dashed_matches_find_allcCs|jddgfdddgfdS)NZh1header1Zh2header2header3)r)rrrrtest_header_tags sz!TestSoupSelector.test_header_tagscCsVxPd D]H}|jj|}|jt|d|j|djd|j|dddgqWdS) N.onepp.onep html p.onepr=rrrmonep)rrr)r$rrr,rN)rrrrrrtest_class_ones   zTestSoupSelector.test_class_onecCs |jjd}|jt|ddS)Nzdiv.onepr)r$rrr,)rrrrrtest_class_mismatched_tags z*TestSoupSelector.test_class_mismatched_tagcCs xdD]}|j|dgqWdS)N div#inner#inner div div#innerr)rrr)r)rrrrr test_one_ids zTestSoupSelector.test_one_idcCs |jjd}|jt|ddS)Nz #doesnotexistr)r$rrr,)rrrrr test_bad_ids zTestSoupSelector.test_bad_idcCsf|jjd}|jt|dx|D]}|j|jdq"W|j|dddg|j|djddS)Nz div#inner pr9rr=rmrr)r$rrr,rNr8r7)rrrrrrtest_items_in_id#s   z!TestSoupSelector.test_items_in_idcCs*x$dD]}|jt|jj|dqWdS)N div#main deldiv#main div.oops div div#mainr)rrr)rr,r$r)rrrrrtest_a_bunch_of_emptys+s z'TestSoupSelector.test_a_bunch_of_emptyscCs xd D]}|j|d gqWdS) N.class1p.class1.class2p.class2.class3p.class3 html p.class2div#inner .class2pmulti)rrrrrrrr)r)rrrrrtest_multi_class_support/sz)TestSoupSelector.test_multi_class_supportcCs xdD]}|j|dgqWdS)N.class1.class3.class3.class2.class1.class2.class3r)rrr)r)rrrrrtest_multi_class_selection4sz+TestSoupSelector.test_multi_class_selectioncCs"|jdddg|jddgdS)Nz.s1 > as1a1s1a2z .s1 > a spans1a2s1)r)rrrrtest_child_selector9sz$TestSoupSelector.test_child_selectorcCs|jddgdS)Nz.s1 > a#s1a2 spanr)r)rrrrtest_child_selector_id=sz'TestSoupSelector.test_child_selector_idcCst|jddgfddgfddgfddgfddgfddgfd dgfd gfd dgfd dgfd dgfdgfdgfdgfdS)Nzp[class="onep"]rz p[id="p1"]z[class="onep"]z [id="p1"]zlink[rel="stylesheet"]l1zlink[type="text/css"]zlink[href="blah.css"]zlink[href="no-blah.css"]z[rel="stylesheet"]z[type="text/css"]z[href="blah.css"]z[href="no-blah.css"]zp[href="no-blah.css"])r)rrrrtest_attribute_equals@sz&TestSoupSelector.test_attribute_equalsc Cs\|jddgfddgfddgfddgfddgfddgfdd gfd d gfd d gfd d gf dS) Nzp[class~="class1"]rzp[class~="class2"]zp[class~="class3"]z[class~="class1"]z[class~="class2"]z[class~="class3"]za[rel~="friend"]bobz a[rel~="met"]z[rel~="friend"]z [rel~="met"])r)rrrrtest_attribute_tildeRsz%TestSoupSelector.test_attribute_tildecCsv|jddgfddgfdgfdgfdgfddgfdd d gfd d d gfd d dgfdd dgfddgfdd gfddgf dS)Nz[rel^="style"]rzlink[rel^="style"]znotlink[rel^="notstyle"]z[rel^="notstyle"]zlink[rel^="notstyle"]zlink[href^="bla"]za[href^="http://"]rmez[href^="http://"]z [id^="p"]rrz [id^="m"]rz div[id^="m"]z a[id^="m"]zdiv[data-tag^="dashed"]r)r)rrrrtest_attribute_startswith`s    z*TestSoupSelector.test_attribute_startswithc CsH|jddgfddgfddgfdddddd d d d gfd dgfdgfdS)Nz[href$=".css"]rzlink[href$=".css"]z link[id$="1"]z [id$="1"]rrrrs2a1rrz div[id$="1"]z[id$="noending"])r)rrrrtest_attribute_endswithqsz(TestSoupSelector.test_attribute_endswithcCs|jddgfddgfdgfdgfdgfddgfdd d gfd d d gfddgfdd gfddgfddgfddgfdddd ddddddg fddgfdgfdd d dgfdd d gfd dgfd!dd"gfd#d"gfd$dgfdS)%Nz[rel*="style"]rzlink[rel*="style"]znotlink[rel*="notstyle"]z[rel*="notstyle"]zlink[rel*="notstyle"]zlink[href*="bla"]z[href*="http://"]rrz [id*="p"]rrz div[id*="m"]rz a[id*="m"]z[href*=".css"]zlink[href*=".css"]z link[id*="1"]z [id*="1"]rrrrrrrz div[id*="1"]z[id*="noending"]z [href*="."]z a[href*="."]zlink[href*="."]z div[id*="n"]rz div[id*="nn"]zdiv[data-tag*="edval"])r)rrrrtest_attribute_contains{s.     z(TestSoupSelector.test_attribute_containscCs2|jddddgfddddgfddgfdgfdS) Nz p[lang|="en"]zlang-enz lang-en-gbz lang-en-usz [lang|="en"]z p[lang|="fr"]zlang-frz p[lang|="gb"])r)rrrrtest_attribute_exact_or_hypens   z.TestSoupSelector.test_attribute_exact_or_hypenc CsV|jddddgfddgfdddgfddd d d gfd d dgfdgfdgfddgfdS)Nz[rel]rrrz link[rel]za[rel]z[lang]zlang-enz lang-en-gbz lang-en-uszlang-frzp[class]rrz[blah]zp[blah]z div[data-tag]r)r)rrrrtest_attribute_existss   z&TestSoupSelector.test_attribute_existscCs,d}t|d}|jd\}|jd|jdS)Nz]
              nope
              yes
              z html.parserzdiv[style="display: right"]yes)rrrr)rrr$Zchosenrrr"test_quoted_space_in_selector_names  z3TestSoupSelector.test_quoted_space_in_selector_namecCs(|jt|jjd|jt|jjddS)Nza:no-such-pseudoclassza:nth-of-type(a))rrr$r)rrrrtest_unsupported_pseudoclasssz-TestSoupSelector.test_unsupported_pseudoclasscCs|jjd}|jt|d|j|djd|jjd}|jt|d|j|djd|jjd}|jt|d|jt|jjddS) Nzdiv#inner p:nth-of-type(1)r=rz Some textzdiv#inner p:nth-of-type(3)ZAnotherzdiv#inner p:nth-of-type(4)zdiv p:nth-of-type(0))r$rrr,rrr)rrrrrtest_nth_of_types   z!TestSoupSelector.test_nth_of_typecCs2|jjd}|jt|d|j|djddS)Nzdiv#inner > p:nth-of-type(1)r=rz Some text)r$rrr,r)rrrrr"test_nth_of_type_direct_descendants z3TestSoupSelector.test_nth_of_type_direct_descendantcCs|jddgdS)Nz#inner > p:nth-of-type(2)r)r)rrrr"test_id_child_selector_nth_of_typesz3TestSoupSelector.test_id_child_selector_nth_of_typecCs.|jjddd}|jd}|j|ddgdS)Nrar)rrr)r$r%rr)rrselectedrrrtest_select_on_elements z'TestSoupSelector.test_select_on_elementcCs|jddg|jdgdS)Nz .fancy #innerrz.normal #inner)r)rrrrtest_overspecified_child_idsz,TestSoupSelector.test_overspecified_child_idcCsB|jddg|jddg|jddg|jg|jjddS)Nz#p1 + h2rz #p1 + h2 + prz#p1 + #header2 + .class1z#p1 + p)rrr$r)rrrrtest_adjacent_sibling_selectorsz/TestSoupSelector.test_adjacent_sibling_selectorcCsR|jdddg|jddg|jddg|jddg|jg|jjddS) Nz#p1 ~ h2rrz#p1 ~ #header2z #p1 ~ h2 + arz#p1 ~ h2 + [rel="me"]z #inner ~ h2)rrr$r)rrrrtest_general_sibling_selectors z.TestSoupSelector.test_general_sibling_selectorcCs|jt|jjddS)Nzh1 >)rrr$r)rrrrtest_dangling_combinatorsz)TestSoupSelector.test_dangling_combinatorcCs|jddddgdS)Nz p[lang] ~ pz lang-en-gbz lang-en-uszlang-fr)r)rrrr2test_sibling_combinator_wont_select_same_tag_twiceszCTestSoupSelector.test_sibling_combinator_wont_select_same_tag_twicecCs|jdddgdS)Nzx, yxidyid)r)rrrrtest_multiple_selectsz%TestSoupSelector.test_multiple_selectcCs|jdddgdS)Nzx,yrr)r)rrrr"test_multiple_select_with_no_spacesz3TestSoupSelector.test_multiple_select_with_no_spacecCs|jdddgdS)Nzx, yrr)r)rrrr$test_multiple_select_with_more_spacesz5TestSoupSelector.test_multiple_select_with_more_spacecCs|jddgdS)Nzx, xr)r)rrrrtest_multiple_select_duplicatedsz0TestSoupSelector.test_multiple_select_duplicatedcCs|jdddgdS)Nzx, y ~ p[lang=fr]rzlang-fr)r)rrrrtest_multiple_select_siblingsz-TestSoupSelector.test_multiple_select_siblingcCs|jdddgdS)Nzx, y > zrzidb)r)rrrr.test_multiple_select_tag_and_direct_descendantsz?TestSoupSelector.test_multiple_select_tag_and_direct_descendantcCs|jdddddddgdS)Nz div > x, y, zrrzidarzidabzidac)r)rrrr/test_multiple_select_direct_descendant_and_tags sz@TestSoupSelector.test_multiple_select_direct_descendant_and_tagscCs|jdddddddgdS)Nz div x,y, zrrr rr r )r)rrrr(test_multiple_select_indirect_descendant sz9TestSoupSelector.test_multiple_select_indirect_descendantcCs(|jt|jjd|jt|jjddS)Nz,x, yzx,,y)rrr$r)rrrrtest_invalid_multiple_selectsz-TestSoupSelector.test_invalid_multiple_selectcCs|jdddgdS)Nzp[lang=en], p[lang=en-gb]zlang-enz lang-en-gb)r)rrrrtest_multiple_select_attrssz+TestSoupSelector.test_multiple_select_attrscCs|jddddgdS)Nz*x, y > z[id=zida], z[id=zidab], z[id=zidb]rrr )r)rrrrtest_multiple_select_idssz)TestSoupSelector.test_multiple_select_idscCs|jdddgdS)Nzbody > div > x, y > zrr)r)rrrrtest_multiple_select_nestedsz,TestSoupSelector.test_multiple_select_nestedcCsRd}t|d}|jd}|jdt|x$|jddgdD]}||ks:tq:WdS)Nz3
              z html.parserz.c1, .c2r9Zc1Zc2)ro)rrrr,r-r)rrr$rrrrrtest_select_duplicate_elementss   z/TestSoupSelector.test_select_duplicate_elementsN)>rrr rrRrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr r rrrrrrrrrrsv1     r)2r1ZpdbrrYrSr6r1Zbs4rZ bs4.builderrrZ bs4.elementrrrr r r r r Z bs4.testingrrrrZ LXML_PRESENTrr!r2rLrPrdrrrrrrrrrrr-rGrRrgrrrrrrr sN  ( ;O83(3--*6n+a4*PK!vG4Wtests/__init__.pynu["The beautifulsoup tests." PK!jtests/test_builder_registry.pynu["""Tests of the builder registry.""" import unittest import warnings from bs4 import BeautifulSoup from bs4.builder import ( builder_registry as registry, HTMLParserTreeBuilder, TreeBuilderRegistry, ) try: from bs4.builder import HTML5TreeBuilder HTML5LIB_PRESENT = True except ImportError: HTML5LIB_PRESENT = False try: from bs4.builder import ( LXMLTreeBuilderForXML, LXMLTreeBuilder, ) LXML_PRESENT = True except ImportError: LXML_PRESENT = False class BuiltInRegistryTest(unittest.TestCase): """Test the built-in registry with the default builders registered.""" def test_combination(self): if LXML_PRESENT: self.assertEqual(registry.lookup('fast', 'html'), LXMLTreeBuilder) if LXML_PRESENT: self.assertEqual(registry.lookup('permissive', 'xml'), LXMLTreeBuilderForXML) self.assertEqual(registry.lookup('strict', 'html'), HTMLParserTreeBuilder) if HTML5LIB_PRESENT: self.assertEqual(registry.lookup('html5lib', 'html'), HTML5TreeBuilder) def test_lookup_by_markup_type(self): if LXML_PRESENT: self.assertEqual(registry.lookup('html'), LXMLTreeBuilder) self.assertEqual(registry.lookup('xml'), LXMLTreeBuilderForXML) else: self.assertEqual(registry.lookup('xml'), None) if HTML5LIB_PRESENT: self.assertEqual(registry.lookup('html'), HTML5TreeBuilder) else: self.assertEqual(registry.lookup('html'), HTMLParserTreeBuilder) def test_named_library(self): if LXML_PRESENT: self.assertEqual(registry.lookup('lxml', 'xml'), LXMLTreeBuilderForXML) self.assertEqual(registry.lookup('lxml', 'html'), LXMLTreeBuilder) if HTML5LIB_PRESENT: self.assertEqual(registry.lookup('html5lib'), HTML5TreeBuilder) self.assertEqual(registry.lookup('html.parser'), HTMLParserTreeBuilder) def test_beautifulsoup_constructor_does_lookup(self): with warnings.catch_warnings(record=True) as w: # This will create a warning about not explicitly # specifying a parser, but we'll ignore it. # You can pass in a string. BeautifulSoup("", features="html") # Or a list of strings. BeautifulSoup("", features=["html", "fast"]) # You'll get an exception if BS can't find an appropriate # builder. self.assertRaises(ValueError, BeautifulSoup, "", features="no-such-feature") class RegistryTest(unittest.TestCase): """Test the TreeBuilderRegistry class in general.""" def setUp(self): self.registry = TreeBuilderRegistry() def builder_for_features(self, *feature_list): cls = type('Builder_' + '_'.join(feature_list), (object,), {'features' : feature_list}) self.registry.register(cls) return cls def test_register_with_no_features(self): builder = self.builder_for_features() # Since the builder advertises no features, you can't find it # by looking up features. self.assertEqual(self.registry.lookup('foo'), None) # But you can find it by doing a lookup with no features, if # this happens to be the only registered builder. self.assertEqual(self.registry.lookup(), builder) def test_register_with_features_makes_lookup_succeed(self): builder = self.builder_for_features('foo', 'bar') self.assertEqual(self.registry.lookup('foo'), builder) self.assertEqual(self.registry.lookup('bar'), builder) def test_lookup_fails_when_no_builder_implements_feature(self): builder = self.builder_for_features('foo', 'bar') self.assertEqual(self.registry.lookup('baz'), None) def test_lookup_gets_most_recent_registration_when_no_feature_specified(self): builder1 = self.builder_for_features('foo') builder2 = self.builder_for_features('bar') self.assertEqual(self.registry.lookup(), builder2) def test_lookup_fails_when_no_tree_builders_registered(self): self.assertEqual(self.registry.lookup(), None) def test_lookup_gets_most_recent_builder_supporting_all_features(self): has_one = self.builder_for_features('foo') has_the_other = self.builder_for_features('bar') has_both_early = self.builder_for_features('foo', 'bar', 'baz') has_both_late = self.builder_for_features('foo', 'bar', 'quux') lacks_one = self.builder_for_features('bar') has_the_other = self.builder_for_features('foo') # There are two builders featuring 'foo' and 'bar', but # the one that also features 'quux' was registered later. self.assertEqual(self.registry.lookup('foo', 'bar'), has_both_late) # There is only one builder featuring 'foo', 'bar', and 'baz'. self.assertEqual(self.registry.lookup('foo', 'bar', 'baz'), has_both_early) def test_lookup_fails_when_cannot_reconcile_requested_features(self): builder1 = self.builder_for_features('foo', 'bar') builder2 = self.builder_for_features('foo', 'baz') self.assertEqual(self.registry.lookup('bar', 'baz'), None) PK!++tests/test_docs.pynu["Test harness for doctests." # pylint: disable-msg=E0611,W0142 __metaclass__ = type __all__ = [ 'additional_tests', ] import atexit import doctest import os #from pkg_resources import ( # resource_filename, resource_exists, resource_listdir, cleanup_resources) import unittest DOCTEST_FLAGS = ( doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE | doctest.REPORT_NDIFF) # def additional_tests(): # "Run the doc tests (README.txt and docs/*, if any exist)" # doctest_files = [ # os.path.abspath(resource_filename('bs4', 'README.txt'))] # if resource_exists('bs4', 'docs'): # for name in resource_listdir('bs4', 'docs'): # if name.endswith('.txt'): # doctest_files.append( # os.path.abspath( # resource_filename('bs4', 'docs/%s' % name))) # kwargs = dict(module_relative=False, optionflags=DOCTEST_FLAGS) # atexit.register(cleanup_resources) # return unittest.TestSuite(( # doctest.DocFileSuite(*doctest_files, **kwargs))) PK! nb++tests/test_html5lib.pynu["""Tests to ensure that the html5lib tree builder generates good trees.""" import warnings try: from bs4.builder import HTML5TreeBuilder HTML5LIB_PRESENT = True except ImportError as e: HTML5LIB_PRESENT = False from bs4.element import SoupStrainer from bs4.testing import ( HTML5TreeBuilderSmokeTest, SoupTest, skipIf, ) @skipIf( not HTML5LIB_PRESENT, "html5lib seems not to be present, not testing its tree builder.") class HTML5LibBuilderSmokeTest(SoupTest, HTML5TreeBuilderSmokeTest): """See ``HTML5TreeBuilderSmokeTest``.""" @property def default_builder(self): return HTML5TreeBuilder() def test_soupstrainer(self): # The html5lib tree builder does not support SoupStrainers. strainer = SoupStrainer("b") markup = "

              A bold statement.

              " with warnings.catch_warnings(record=True) as w: soup = self.soup(markup, parse_only=strainer) self.assertEqual( soup.decode(), self.document_for(markup)) self.assertTrue( "the html5lib tree builder doesn't support parse_only" in str(w[0].message)) def test_correctly_nested_tables(self): """html5lib inserts tags where other parsers don't.""" markup = ('' '' "') self.assertSoupEquals( markup, '
              Here's another table:" '' '' '
              foo
              Here\'s another table:' '
              foo
              ' '
              ') self.assertSoupEquals( "" "" "
              Foo
              Bar
              Baz
              ") def test_xml_declaration_followed_by_doctype(self): markup = '''

              foo

              ''' soup = self.soup(markup) # Verify that we can reach the

              tag; this means the tree is connected. self.assertEqual(b"

              foo

              ", soup.p.encode()) def test_reparented_markup(self): markup = '

              foo

              \n

              bar

              ' soup = self.soup(markup) self.assertEqual("

              foo

              \n

              bar

              ", soup.body.decode()) self.assertEqual(2, len(soup.find_all('p'))) def test_reparented_markup_ends_with_whitespace(self): markup = '

              foo

              \n

              bar

              \n' soup = self.soup(markup) self.assertEqual("

              foo

              \n

              bar

              \n", soup.body.decode()) self.assertEqual(2, len(soup.find_all('p'))) def test_reparented_markup_containing_identical_whitespace_nodes(self): """Verify that we keep the two whitespace nodes in this document distinct when reparenting the adjacent tags. """ markup = '
              ' soup = self.soup(markup) space1, space2 = soup.find_all(string=' ') tbody1, tbody2 = soup.find_all('tbody') assert space1.next_element is tbody1 assert tbody2.next_element is space2 def test_reparented_markup_containing_children(self): markup = '' soup = self.soup(markup) noscript = soup.noscript self.assertEqual("target", noscript.next_element) target = soup.find(string='target') # The 'aftermath' string was duplicated; we want the second one. final_aftermath = soup.find_all(string='aftermath')[-1] # The