usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pyo000064400000010065147205622650021660 0ustar00 abc@sBddlZddlZddlmZdefdYZdS(iNi(t ProbingStatet CharSetProbercBseZdZd dZdZedZdZedZ dZ e dZ e dZ e d ZRS( gffffff?cCs(d|_||_tjt|_dS(N(tNonet_statet lang_filtertloggingt getLoggert__name__tlogger(tselfR((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pyt__init__'s  cCstj|_dS(N(Rt DETECTINGR(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytreset,scCsdS(N(R(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pyt charset_name/scCsdS(N((R tbuf((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfeed3scCs|jS(N(R(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytstate6scCsdS(Ng((R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytget_confidence:scCstjdd|}|S(Ns([-])+t (tretsub(R((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfilter_high_byte_only=scCszt}tjd|}xX|D]P}|j|d |d}|j re|dkred}n|j|q"W|S(s5 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [-] marker: everything else [^a-zA-Z-] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?isR(t bytearrayRtfindalltextendtisalpha(Rtfilteredtwordstwordt last_char((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfilter_international_wordsBs      cCst}t}d}xtt|D]}|||d!}|dkrTt}n|dkrit}n|dkr(|j r(||kr| r|j|||!|jdn|d}q(q(W|s|j||n|S(s Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. iit>ts