Git Product home page Git Product logo

mechanize's People

Contributors

jjlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mechanize's Issues

BeautifulStoneSoup in select_form messes up utf-8

In UTF-8 the character Ü is represented by two bytes, one of which appears as a key in mechanize._beautifulsoup.BeautifulStoneSoup.MS_CHARS

In Browser.open a subclass of BeautifulStoneSoup called MechanizeBs is used, which overrides BeautifulStoneSoup.PARSER_MASSAGE, so that MS_CHARS is ignored.

In Browser.select_form however, mechanize._form.RobustFormParser is used, which uses BeautifulStoneSoup directly, which uses MS_CHARS for replacements. This leads to one of the bytes of UTF-8 Ü being replaced, which destroys the Ü character. As a consequence controls with labels containing Ü cannot be found by their label anymore, i.e. the following : browser.click( label='Übernehmen' ) fails with a ControlNotFoundError: no control matching kind 'clickable', label 'Übernehmen'.

I currently worked around that using a monkey patch:

import mechanize
mechanize._form.RobustFormParser.PARSER_MASSAGE = mechanize._html.MechanizeBs.PARSER_MASSAGE

A real fix would be appreciated :). Thx!

Breaks on Delicious login form

This is with the latest mechanize. Just as an example, ClientForm is unable to parse https://delicious.com/login correctly. It fails to pick up the second form which comes right after the <hr/>. If you insert any form right after that <hr/>, it will be omitted from Browser.forms(). If you remove the <hr/>, the form gets picked up.

setuptool install broken

setuptools install of mechanize is broken (whereas pip works)

Facts :

$ man virtualenv 
$ virtualenv --setuptools test
New python executable in test/bin/python
Installing setuptools............done.
$ cd test/
$ . bin/activate
(test)$ bin/easy_install mechanize
Searching for mechanize
Reading http://pypi.python.org/simple/mechanize/
Reading http://wwwsearch.sourceforge.net/mechanize/
Best match: mechanize 0.2.4
Downloading http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.2.4.tar.gz
Traceback (most recent call last):
  File "bin/easy_install", line 8, in <module>
    load_entry_point('setuptools==0.6c11', 'console_scripts', 'easy_install')()
…
  File "/tmp/test/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/package_index.py", line 553, in _download_to
ValueError: invalid literal for int() with base 10: '382727, 382727'

After investigating it seems that setuptools use dowload url on pypi page (whereas pip use archives hosted on pypi). It fails when checking Content-Length sent by http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.2.4.tar.gz which is repeated two times and given to setuptool as a tuple (and setuptool expect an int) :

$ curl -D - http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.2.4.tar.gz -o /tmp/tmp.tar.gz
Server: Apache/2.2.3 (CentOS)
Last-Modified: Thu, 28 Oct 2010 20:57:05 GMT
ETag: "5d707-493b395976a40"
Content-Length: 382727                                             <-----
Expires: Sat, 02 Apr 2011 07:41:43 GMT
Content-Type: application/x-gzip
Content-Length: 382727                                             <-----
Date: Thu, 31 Mar 2011 07:41:43 GMT
X-Varnish: 58847923
Age: 0
Via: 1.1 varnish
Connection: keep-alive

No .set_timeout() method on mechanize.UserAgent

There's no .set_timeout() method.

Expect:

  • This should cause the .open() to time out in the same way as providing a timeout argument to .open():

browser = mechanize.Browser()
browser.set_timeout(10.)
browser.open("http://example.com")

  • The argument to .open() should override the .set_timeout() default.

Got: no .set_timeout method

Browser.open() hangs if Transfer-Encoding: chunked

Hello there,

I am having issues when trying to open pages when getting Transfer-Encoding: chunked in responses.

Browser.open() simply hangs without raising any exception. I don't have stacktrace to show, but here is the debug output of the request:

send: 'GET http://www.tuttosport.com/robots.txt HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.tuttosport.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Length: 28
header: ETag: "5417a9-1c-44692ef7da100"
header: Date: Sat, 01 Oct 2011 01:00:29 GMT
header: Last-Modified: Wed, 20 Feb 2008 08:40:04 GMT
header: Expires: Sat, 01 Oct 2011 01:05:29 GMT
header: Server: Apache
header: Accept-Ranges: bytes
header: Content-Type: text/plain
header: Connection: close
send: 'GET http://www.tuttosport.com/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.tuttosport.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sat, 01 Oct 2011 01:00:30 GMT
header: Expires: Sat, 01 Oct 2011 01:05:30 GMT
header: Server: Apache
header: Accept-Ranges: bytes
header: Content-Type: text/html
header: Transfer-Encoding: chunked
header: Age: 1
header: Connection: close

No way to pass arbitrary predicate to .click / .submit methods

This should result in the first button being clicked:

import mechanize
browser = mechanize.Browser()
browser.set_response(mechanize.make_response(
        """\
<button type="submit" name="action" value="publish">Publish</button>
<button type="submit" name="action" value="preview">Preview</button>
""",
        [("Content-Type", "text/html")],
        "http://example.com/", 200, "OK"))
form = browser.global_form()
form.click(predicate=lambda control: control.name == "publish")

Errors when running the tests

Hi, I was trying to package this for Gentoo. This is what my test run gives me:

/var/tmp/portage/dev-python/mechanize-0.2.0/work/mechanize-0.2.0/test/test_api.py:6: SyntaxWarning: import * only allowed at module level
def test_import_all(self):
test-tools/testprogram.py:401: UserWarning: Skipping functional tests: Failed to import twisted.web2 and/or zope.interface
warnings.warn("Skipping functional tests: Failed to import "

(After that, all tests are either skipped or pass. Not sure the UserWarning here is good, I'd prefer to just have a bunch of skipped tests.)

Browser.retrieve, original filename and incomplete httplib.HTTPMessage RFC822 header parsing

I had some issues with Browser.retrieve and original filenames:

  1. Browser.retrieve(someurl) returns a (tmp_filename, httplib.HTTPMessage), with a temporary filename from tempfile.mkstemp;
  2. Browser.retrieve(someurl, filename) returns a (filename, httplib.HTTPMessage);
  3. but there's no way tho get the original filename, even if it's present in the 'Content-disposition: attachment; filename="abcd.xyz"' httplib.HTTPMessage header.

That's not really mechanize's fault: to extract those header parameters, httplib.HTTPMessage is missing a crucial 'get_filename' or a more generic 'get_param' methods, that are both present in the email.message.Message class.

httplib.HTTPMessage has indeed a 'getparam' method, but unfortunately, it's only used/usable for 'content-type' header parsing.

I submitted an issue on the Python tracker (http://bugs.python.org/issue11316) and proposed a 'monkeypatch_http_message' decorator as a workaround, so we can do:

import mechanize
from some.module import monkeypatch_http_message
browser = mechanize.Browser()
(tmp_filename, headers) = browser.retrieve(someurl) 

# monkeypatch the httplib.HTTPMessage instance
monkeypatch_http_message(headers)

# yeah... my original filename, finally
filename = headers.get_filename()

No way to give data as bytes to FileControl

FileControl assumes that the data to include in the field of the form comes from a file on disk. It should also allow adding a file from a byte array source.

In most environments there's an easy work-around for the current limitation, just write a temporary file. Unfortunate Google AppEngine doesn't allow writing files. (And I want to take data from a db.Blob and upload it to a web form.)

If there's another workaround I haven't thought of please let me know. Maybe a proxy class that works enough like File but is sourced with a byte array?

Use absolute imports

Require Python 2.5 and use the absolute and relative imports features.

The reason for using absolute imports is described in PEP 328.

Quoted cookies get wrongly? escaped

In _clientcookie.py:_cookie_attrs() cookie values that do match only \W characters will have there double quotes explicitly escaped.

This changes the value of quoted-cookies when they return to the webapp. For example:

A cookie comes in with the key/value pair: hello => "world"
The quote substitution will make this into: hello => "world"
Now wireshark tells me that the following is send on the next request:

Cookie: hello=\"world\"; $Path="/"; $Domain=".some.testdomain.com"

When I comment out this part of the code:

            # quote cookie value if necessary
            # (not for Netscape protocol, which already has any quotes
            #  intact, due to the poorly-specified Netscape Cookie: syntax)
#            if ((cookie.value is not None) and
#                self.non_word_re.search(cookie.value) and version > 0):
#                value = self.quote_re.sub(r"\\\1", cookie.value)
#            else:
                value = cookie.value

Everything works as expected.

I'm not quite sure what the comments means by 'not for netscape protocol', should there be an extra check in there
to check that it's not a mozilla style mechanized browser ?

I've checked the webapp (not mime) and it appears to not be doing anything that is not understood by any browser. The cookies as given by the webapp work as expected on any browser.

mechanize.Browser() not working with socks have port is 27977

import socks
import socket
import mechanize

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "76.73.239.33", 27977)
socket.socket = socks.socksocket

br = mechanize.Browser()

br.open("https://www.google.com")

Traceback (most recent call last):
File "tets1.py", line 16, in
br.open("https://www.google.com")
File "build\bdist.win32\egg\mechanize_mechanize.py", line 203, in open
File "build\bdist.win32\egg\mechanize_mechanize.py", line 230, in _mech_open
File "build\bdist.win32\egg\mechanize_opener.py", line 188, in open
File "build\bdist.win32\egg\mechanize_http.py", line 316, in http_request
File "build\bdist.win32\egg\mechanize_http.py", line 242, in read
File "build\bdist.win32\egg\mechanize_mechanize.py", line 203, in open
File "build\bdist.win32\egg\mechanize_mechanize.py", line 230, in _mech_open
File "build\bdist.win32\egg\mechanize_opener.py", line 193, in open
File "build\bdist.win32\egg\mechanize_urllib2_fork.py", line 344, in _open
File "build\bdist.win32\egg\mechanize_urllib2_fork.py", line 332, in _call_chain
File "build\bdist.win32\egg\mechanize_urllib2_fork.py", line 1170, in https_open
File "build\bdist.win32\egg\mechanize_urllib2_fork.py", line 1115, in do_open
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 866, in request
self._send_request(method, url, body, headers)
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 889, in _send_request
self.endheaders()
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 860, in endheaders
self._send_output()
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 732, in _send_output
self.send(msg)
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 699, in send
self.connect()
File "V:\python\python.v2.54_portable\App\lib\httplib.py", line 1134, in connect
sock.connect((self.host, self.port))
File "V:\python\python.v2.54_portable\App\lib\site-packages\socks.py", line 369, in connect
self.__negotiatesocks5(destpair[0],destpair[1])
File "V:\python\python.v2.54_portable\App\lib\site-packages\socks.py", line 236, in __negotiatesocks5
raise Socks5Error(ord(resp[1]),_generalerrors[ord(resp[1])])
TypeError: init() takes exactly 2 arguments (3 given)

Exit code: 1

RobustFactory fails to use BeautifulSoup to parse forms

The problem is: in RobustFactory, the FormsFactory is in fact still a
default FormsFactory, not the beautifulsoup's.

In _html.py, line 423 (the blue line), when constructing RobustFormsFactory,
the assignment does not work, I printed out the form_parser_class in
FormsFactory's constructor, and it's shown as "None";

So I added the red line to solve it, just a quick fix, hope you can update
it in next version:

class RobustFormsFactory(FormsFactory):
   def __init__(self, *args, **kwds):
   args = form_parser_args(*args, **kwds)
   if args.form_parser_class is None:
       args.form_parser_class = RobustFormParser
       args.dictionary['form_parser_class'] = RobustFormParser
   FormsFactory.__init__(self, **args.dictionary)

Parser error when using Browser.follow_link(url_regex=)

The parser throwing an error (ParseError: expected name token at "<!';\npixiv.context.u") when using follow_link(url_regex='URL_PATTERN').

The "<!" is inside javascript string variable, not for denoting html comment, the full script in here:

....

<script> pixiv.context.illustId = '14245299'; pixiv.context.illustTitle = 'Go to school>///

...

Doesn't throw exception if using RobustFactory()

Traceback with multiple content-type headers

From Felix Heß

trying to read www.cortalconsors.de with mechanize fails. The problem is
in _http.py in the function http_response (line 197). Calling

ct_hdrs = http_message.getheaders("content-type")

returns [''] sometimes. Then is_html(ct_hdrs, url, self._allow_xhtml) fails.

proposed bugfix:

if '' in ct_hdrs:
    ct_hdrs.remove('')

before calling

if is_html(ct_hdrs, url, self._allow_xhtml):

I hope this information helps you to resolve the bug.

Best regards
Felix

Infinite loop on self-refreshing pages

Issue: Calling br.open(url) enters an infinite refresh loop if the page has a refresh header pointing to itself.

Reasons:

  • The default arguments to HTTPRefreshProcessor follow all refreshes, after waiting the page-requested amount of time.
  • Since the page is fetched correctly, adding a timeout parameter to open() does nothing. (Upon reflection, this seems like correct behavior to me. However, it is quite counter-intuitive to a library user, especially since there's no indication of why mechanize is hanging.)

In my case, I don't care about refresh headers, so I simply changed the default arguments at _useragent.py:107.

Possible Solutions:

  • Allow customization of refresh header behavior in the Browser object.
  • Ignore header refreshes after the browser timeout has passed.

Thoughts?

(Thanks for mechanize, btw, it's a fantastic piece of software!)

AttributeError raised instead of ParseError

To reproduce:

  • Call urlopen on a document that causes ParseError to be raised internally

Expect: ParseError

Got: AttributeError (from Elaine Angelino):

In [46]: from mechanize import Browser

In [47]: br = Browser()

In [48]: br.open('http://www.walgreens.com/marketing/storelocator/find.jsp')
Out[48]: <response_seek_wrapper at 0x1b7a080 whose wrapped object =
<closeable_response at 0x1c8b170 whose fp = <socket._fileobject object at
0x1b846b0>>>

In [49]: br.forms()

ParseError Traceback (most recent call last)

/Users/elaineangelino/gotdata/Temp/ in ()

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mechanize-0.1.11-py2.6.egg/mechanize/_mechanize.pyc
in forms(self)
424 if not self.viewing_html():
425 raise BrowserStateError("not viewing HTML")
--> 426 return self._factory.forms()
427
428 def global_form(self):

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mechanize-0.1.11-py2.6.egg/mechanize/_html.pyc
in forms(self)
557 try:
558 self._forms_genf = CachingGeneratorFunction(
--> 559 self._forms_factory.forms())
560 except: # XXXX define exception!
561 self.set_response(self._response)

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mechanize-0.1.11-py2.6.egg/mechanize/_html.pyc
in forms(self)
226 )
227 except ClientForm.ParseError, exc:
--> 228 raise ParseError(exc)
229 self.global_form = forms[0]
230 return forms[1:]

<type 'str'>: (<type 'exceptions.AttributeError'>,
AttributeError("'ParseError' object has no attribute 'msg'",))

In [50]:

``<br/>`` in form makes following control invisible

Reading this page:


with this script:

import mechanize
br = mechanize.Browser()
url = r'file:///home/catherine/Music/badform.html'
br.open(url)
br.select_form('login')
br['passwd'] = 'no problem'
br['username'] = 'problem'

I get:

catherine@dellzilla:~/Music$ python mechbug.py
Traceback (most recent call last):
File "mechbug.py", line 7, in
br['username'] = 'problem'
File "/usr/local/lib/python2.6/dist-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 2895, in setitem
control = self.find_control(name)
File "/usr/local/lib/python2.6/dist-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 3222, in find_control
return self._find_control(name, type, kind, id, label, predicate, nr)
File "/usr/local/lib/python2.6/dist-packages/ClientForm-0.2.10-py2.6.egg/ClientForm.py", line 3306, in _find_control
raise ControlNotFoundError("no control matching "+description)
ClientForm.ControlNotFoundError: no control matching name 'username'

Examining the form controls shows that the submit and passwd controls are present, but the username field is absent from form.controls.

Removing <br/> from the form fixes the problem. In fact, even changing <br/> to <br /> (inserting a space) fixes the problem. Unfortunately, I can't stop the form authors of the world from sticking <br/> in their forms!

Mechanize cannot handle forms with disabled inputs with image type

Here is simple html:




<title></title>






This is simple python example to reproduce the problem

import mechanize
br = mechanize.Browser()
br.open_local_file('test.html')
br.select_form('f')

this example crash with AttributeError: control 'i' is disabled on form selection, root cause is the following line (_form.py, line 2336):

if self.value is None: self.value = ""

in SubmitControl constructor.

Stop bundling beautifulsoup

Just making sure you are aware that this bug was reported on the debian mechanize package:

 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=555349

Debian tries to prevent duplication of code in it's archive mainly for security reasons.

No good way to tell when a timeout occurred

Got: When you request a timeout using the timeout parameter to urlopen (or Browser.open), in order to tell that a timeout occurred, you have to use a poorly-defined interface like HTTPError.reason, using code like this:

import mechanize
import socket
br = mechanize.Browser()
try:
br.open("http://python.org/", timeout=0.001)
except mechanize.URLError, exc:
if isinstance(exc.reason, socket.timeout):
print "timeout occurred"

Expect: There's some clearly defined iinterface for finding out that a timeout imposed by module socket occurred.

Add support for HTML parsing libraries

Python libraries for parsing HTML have improved. mechanize doesn't support three of the most popular choices of the current crop.

Expect: can use some mechanize API to request that one of these libraries is used to parse HTML:

lxml.html
BeautifulSoup 3
html5lib

Got: can only use bundled BeautifulSoup v.2 or Python's sgmllib or SGMLParser modules.

Link attributes is returning a list of tuples

I was expecting it to return attributes like the form, as a dictionary. I changed a single line of code in _html.py and now it appears to be working...

On line 190 of _html.py changed token.attrs to attrs...

bug in ftpwrapper

I notice that if I try to follow an ftp:// type link, there is a crash.
This is because in the _urllib2_fork.py, it imports ftpwrapper from urllib.
This function expects 6 arguments, but throughout the _urllib2_fork.py file, a timeout argument is given, which causes it to choke...

forgot ``seek(0)'' in mechanize ``RobustFactory.set_response()''?

In response to this bug:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=456944

The debian mechanize package is carrying this patch:

 http://svn.debian.org/viewsvn/pkg-zope/python-mechanize/trunk/debian/patches/mechanize_seek.dpatch?revision=2231&view=markup

It seems to not have been applied to recent versions of mecanize. It would be nice to get rid of that patch one way or another.

Mechanize doesn`t work with cookies with an empty "path" attribute

If cookie has path attribute set to empty, mechanize thinks that it is incorrect and bypass them.
But all modern browser (ie, firefox, chrome) work correctly with empty path attributes.
I have a quick patch:

diff --git a/mechanize/_clientcookie.py b/mechanize/_clientcookie.py
index 2ed4c87..2af778a 100644
--- a/mechanize/_clientcookie.py
+++ b/mechanize/_clientcookie.py
@@ -1291,6 +1291,9 @@ class CookieJar:
# is a request to discard (old and new) cookie, though.
k = "expires"
v = self._now + v

  •            if k == "path":
    
  •                if v is None:
    
  •                    v = "/"
             if (k in value_attrs) or (k in boolean_attrs):
                 if (v is None and
                     k not in ["port", "comment", "commenturl"]):
    

URL fragments in links are not handled

I have a link with a fragment in it, for example:

<a href="/somepage#header">More info</a>

if I click on such a link using mechanize I always get a 404. The problems appears to be that the fragment is not removed from the URL before a request is created. This should probably be done in Browser.click_link with a simple link.absolute_url.split("#",1)[0] or something similar.

str(mechanize.ParseError()) traceback

To reproduce:
print str(mechanize.ParseError("spam"))

Expect: "spam" printed
Got:
File "/usr/lib/python2.6/HTMLParser.py", line 59, in str
result = self.msg
AttributeError: 'ParseError' object has no attribute 'msg'

HTTPS CONNECT proxies not supported

Python 2.6 supports the CONNECT method for establishing HTTPS connections through a web proxy.

To reproduce: attempt to mechanize.urlopen() an https: URL served by a remote host when the only route to the web from your host is through an HTTP proxy that supports the CONNECT method.

Expect: can fetch page

Got: fetch fails due to failure connect to remote host

Cannot select a textarea within a form but within a <noscript> tag

I have the following code:

<form action="blabla" blabla >
<input 1 type=blah>
<input 2 type=blah2> etc
<noscript>
    <textarea name="prda" rows="3" cols="40"></textarea>
</noscript>

I want to fill out that textarea preferrably with mechanize (in Python), however, form["prda"] is always giving me control not found error. A user on StackOverflow has suggested that mechanize cannot parse controls that are within tag, which seems kind of odd for me. Is this true?

easy_install error on 2.4 due to yield in try/finally in _firefox3cookiejar.py

Using regular or ==dev easy_install "fails" in both cases, same problem with Python 2.4 on Ubuntu

easy_install-2.4 -U mechanize
Searching for mechanize
Reading http://pypi.python.org/simple/mechanize/
Reading http://wwwsearch.sourceforge.net/mechanize/
Best match: mechanize 0.1.11
Downloading http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.1.11.zip
Processing mechanize-0.1.11.zip
Running mechanize-0.1.11/setup.py -q bdist_egg --dist-dir /tmp/easy_install-IwwENn/mechanize-0.1.11/egg-dist-tmp-iOvYrh
no previously-included directories found matching 'docs-in-progress'
File "build/bdist.linux-i686/egg/mechanize/_firefox3cookiejar.py", line 91
yield row
SyntaxError: 'yield' not allowed in a 'try' block with a 'finally' clause
Adding mechanize 0.1.11 to easy-install.pth file

Installed /usr/local/lib/python2.4/site-packages/mechanize-0.1.11-py2.4.egg
Processing dependencies for mechanize
Finished processing dependencies for mechanize

using non-existent label for click() still submits the form (including FIX)

No control with label 'asdfasdfasdf' exists. Mechanize still submits the form.

browser.forms().next().click(label='asdfasdfasdf')

The following seems to fix the issueVersion 0.2.0. In _form.py, line 3190 add to the condition:

or (label is not None)

With this fix the above clock() call raises

ControlNotFoundError: no control matching kind 'clickable', label 'asdfasdfasdf'

Selecting a single form amongst many

I've been trying to select the 2nd form out of 20+ forms in a page. And it happens to be the only form in the page with a name 'send_form'.

I've tried

br.select_form(nr=1)

and

br.select_form(name='send_form')

and

for f in br.forms():
    if f.name != None:
        br.select_form(name=f.name)

The first results in every single form object from the page. The second returns a no form by that name error. And the third also returns every single form object on the page.

Now there are three fields I'm trying to access name=prospect_email[], name=prospect_name[], name=prospect_telephone[]. Now these input fields also have id's with the same name lacking the []. Now I've successfully input data into fields on other forms so I know how to do it. But when I try to access these I get an error saying the name of ... does not exist. I was figuring it's probably because I don't have the right form selected. I've spent hours on this and I'm racking my brain trying to figure it out. Help will be appreciated.

value for button not submitted to server

I have a form with a couple of submit buttons which look liks this:

<button type="submit" name="action" value="publish">Publish</button>
<button type="submit" name="action" value="preview">Preview</button>

When I click on one of those buttons mechanize submits the form, but does not include an action value in the request data.

Debugging this shows that this goes wrong in ScalarControl._totally_ordered_pairs() for the SubmitButtonControl instance: disabled is set to True, so no pair is returned.

ProxyHandler is missing features/fixes from Python 2.6

Expect: All proxy features available in Python 2.6 are available in mechanize.

Got: proxy bypass settings (e.g. no_proxy environment variable) are ignored (and probably bug fixes, and perhaps other changes are missing).

No equality operator on Cookie

mtamizi reports that the lack of an equality operator makes it awkward to use pickled cookies in sqlalchemy.

Expect: This is true: mechanize.Cookie(**args) == mechanize.Cookie(**args)
Got: It isn't

Test case using sqlalchemy: (fails with Python 2.7 and sqlalchemy 0.6.3): http://gist.github.com/550319

Redirect request must visit if original request does

If you navigate to a page A that redirect to page B, the page you visit is the page B. This is the page that must be added to the history.
When reload()ing the browser, that the page B that must be requested, not page A

This is really problematic if the page A is a submitted form.

The fix is easy, however it break a doctest but, as explain above, this test seems based on a false assumption.

Patch: http://paste.pocoo.org/show/229138/

For testing purpose, here is a simple server and a test script
http://paste.pocoo.org/show/229139/
http://paste.pocoo.org/show/229140/

Add an easy way to submit forms without "clicking" on control

If you automate an ASP.NET site quite often you have to "emulate" javascript handlers in your python code. I have seen a couple of cases then submit should be done after clicking on A tag at the same time form has a clickable control.

Even if I update required hidden controls (__EVENTTARGET) and do browser.form.submit() without arguments, mechanize "emulates" click on the first clickable control and I got wrong result.

It would be very useful if I can use some special argument(s) value to HTMLForm.click which will result in running HTMLForm._switch_click method even if there clickable controls in the form.

Browser.retrieve, original filename and incomplete httplib.HTTPMessage RFC822 header parsing

I had some issues with Browser.retrieve and original filenames, at least in Python 2.6:

  1. Browser.retrieve(someurl) returns a (tmp_filename, httplib.HTTPMessage), with a temporary filename from tempfile.mkstemp;
  2. Browser.retrieve(someurl, filename) returns a (filename, httplib.HTTPMessage);
  3. but there's no way tho get the original filename, even if it's present in the 'Content-disposition: attachment; filename="abcd.xyz"' httplib.HTTPMessage header.

That's not really mechanize's fault: to extract those header parameters, httplib.HTTPMessage is missing a crucial 'get_filename' or a more generic 'get_param' methods, that are both present in the email.message.Message class. httplib.HTTPMessage has indeed a 'getparam' method, but unfortunately, it's only used/usable for 'content-type' header parsing.

I submitted an issue on the Python tracker (http://bugs.python.org/issue11316) and proposed a 'monkeypatch_http_message' decorator as a workaround, so we can do:

import mechanize 
from some.module import monkeypatch_http_message 

browser = mechanize.Browser() 
(tmp_filename, headers) = browser.retrieve(someurl) 

# monkeypatch the httplib.HTTPMessage instance 
monkeypatch_http_message(headers) 

# yeah... my original filename, finally 
filename = headers.get_filename() 

Once again, that's the situation in Python 2.6. According to http://bugs.python.org/issue4773, httplib.HTTPMessage in Python 3.x is using email.message.Message underneath.

(ps: this is an edited repost of issue 35, that I closed by mistake...)

Traceback on unknown encoding

To reproduce:
import mechanize
import mechanize._response

response = mechanize._response.test_response(
    "&lt;",
    headers=[("Content-type", "text/html; charset=\"bogus\"")])
browser = mechanize.Browser()
browser.set_response(response)
browser.forms()

Expect: no traceback (falls back to default encoding)

Got:
Traceback (most recent call last):
File "/home/john/dev/tst.py", line 93, in
browser.forms()
File "/home/john/dev/mechanize/mechanize/_mechanize.py", line 420, in forms
return self._factory.forms()
File "/home/john/dev/mechanize/mechanize/_html.py", line 549, in forms
self._forms_factory.forms())
File "/home/john/dev/mechanize/mechanize/_html.py", line 229, in forms
_urlunparse=_rfc3986.urlunsplit,
File "/home/john/dev/mechanize/mechanize/_form.py", line 844, in ParseResponseEx
_urlunparse=_urlunparse,
File "/home/john/dev/mechanize/mechanize/_form.py", line 981, in _ParseFileEx
fp.feed(data)
File "/home/john/dev/mechanize/mechanize/_form.py", line 758, in feed
_sgmllib_copy.SGMLParser.feed(self, data)
File "/home/john/dev/mechanize/mechanize/_sgmllib_copy.py", line 110, in feed
self.goahead(0)
File "/home/john/dev/mechanize/mechanize/_sgmllib_copy.py", line 199, in goahead
self.handle_entityref(name)
File "/home/john/dev/mechanize/mechanize/_form.py", line 650, in handle_entityref
'&%s;' % name, self._entitydefs, self._encoding))
File "/home/john/dev/mechanize/mechanize/_form.py", line 143, in unescape
return re.sub(r"&#?[A-Za-z0-9]+?;", replace_entities, data)
File "/usr/lib/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/home/john/dev/mechanize/mechanize/_form.py", line 135, in replace_entities
repl = repl.encode(encoding)
LookupError: unknown encoding: bogus

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.