Excerpt |
---|
I wanted to have a diagram showing the number of hits for a particular search term in Google, Yahoo, Bing and other search engines and how they change over time. Naturally, this should be done as a Linux cron job executed in regular intervals. |
The first attempt,
No Format |
---|
wget 'http://www.google.de/search?hl=en&q="my+query"'
--2011-02-03 09:29:54-- http://www.google.de/search?hl=en&q=%22my+query%22
Resolving www.google.de... 74.125.79.147, 74.125.79.99, 74.125.79.104
Connecting to www.google.de|74.125.79.147|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2011-02-03 09:29:55 ERROR 403: Forbidden.
|
...
It seems we have to use a real webbrowser here.
No Format |
---|
lynx 'http://www.google.de/search?hl=en&q="my+query"'
|
works better, but we get a lot of requests for cookies. So, let's accept all cookies by default:
No Format |
---|
lynx 'http://www.google.de/search?hl=en&q="my+query"'
|
Finally, we would like to dump the whole to stdout, instead of running it interactively. We then find the interesting bit of information with some extra grep
commands:
No Format |
---|
lynx -accept_all_cookies -dump 'http://www.google.de/search?hl=en&q="my+query"' | grep About | grep results
About 3,660,000 results (0.12 seconds)
|
...
See also: the actual results of this search script (sorry, page available in German language only).
Related articles
Content by Label |
---|
showLabels | false |
---|
spaces | HOST |
---|
showSpace | false |
---|
sort | modified |
---|
reverse | true |
---|
type | page |
---|
cql | label in ("google","cmd") and type = "page" |
---|
labels | kb-how-to-article |
---|
|