Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

I wanted to have a diagram showing the number of hits for a particular search term in Google, Yahoo, Bing and other search engines and how they change over time. Naturally, this should be done as a Linux cron job executed in regular intervals.


The first attempt,

No Format

wget 'http://www.google.de/search?hl=en&q="my+query"'
--2011-02-03 09:29:54--  http://www.google.de/search?hl=en&q=%22my+query%22
Resolving www.google.de... 74.125.79.147, 74.125.79.99, 74.125.79.104
Connecting to www.google.de|74.125.79.147|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2011-02-03 09:29:55 ERROR 403: Forbidden.

...

It seems we have to use a real webbrowser here.

No Format

lynx 'http://www.google.de/search?hl=en&q="my+query"'

works better, but we get a lot of requests for cookies. So, let's accept all cookies by default:

No Format

lynx 'http://www.google.de/search?hl=en&q="my+query"'

Finally, we would like to dump the whole to stdout, instead of running it interactively. We then find the interesting bit of information with some extra grep commands:

No Format

lynx -accept_all_cookies -dump 'http://www.google.de/search?hl=en&q="my+query"' | grep About | grep results
   About 3,660,000 results (0.12 seconds)

...

See also: the actual results of this search script (sorry, page available in German language only).

Related articles

Content by Label
showLabelsfalse
spacesHOST
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("google","cmd") and type = "page"
labelskb-how-to-article

Page properties
hiddentrue
Related issues
To Do