Word-definitions on the command-line

Posted 30th Sep 2017 at 07:31 by Michael Uplawski

Tags html, nokogiri, parse, word-definition

Updates:

Bash/Ruby script to retrieve on the command-line word-definitions from an online dictionary.

The following resources have inspired this entry and should be helpful to find alternative solutions:

What it does

If the script is called as “define”: define execute definition received for execute

Motivation

I have been using the Leo-script to retrieve translations from and to the German, the French or the English language, wrote a script which helps me conjugate French verbs and another one to retrieve synonyms.

When I replaced LEO by dict, I found the results a little short and the lack of alternative word-propositions downright disturbing. What I needed was the assurance that a proposed translation would also fit in the context of my current translation task.., i.e. hints on the exact meaning of an English or French word.

The Merriam-Webster online-dictionary (amongst others) is filling this gap, but when you call a text-mode browser on a search url (like https://www.merriam-webster.com/dictionary/execute), the head of the page and other graphical elements in the output become obstructive.

Global Solution

Extracting interesting sections from html-pages is part of what my utility Html2Index does already, and so I was happy to find out that the Nokogiri-gem which I use in some of my Ruby-programs, comes with a command-line utility, serving the very same purpose.

Instead of a shell-script, you can also just define an alias-command for your shell. I prefer the script, as it can be called from the Mutt mail-client, whereas the alias-commands are not available to the command-line interface in Mutt.

For displaying the search results which are provided by the online-dictionary, then extracted by Nokogiri, you can use any command-line html-browser; my own choice is w3m.

Alternatively, if you need the output on a graphical window, take a look at yad and its --browser mode. Instead of the html-output, you can choose to just display a raw text-version of the search-results and pipe it to just any editor or pager that you like...

Here is a second picture showing the results with a script that I adapted to the Larousse online-dictionaries: result from Larouse

Preconditions

You must have a ruby-interpreter and the Nokogiri-gem installed. Curl should already be available on your system, otherwise install it with the help of your package-manager. The same applies to W3M or Lynx, the text-mode browsers.

Script


  #!/bin/bash
  # ©2017-2017 Michael Uplawski <michael.uplawski@uplawski.eu>
  # Use this script at your own risk, modify it as you please.
  # But maybe leave the copyright-notice intact. Thank You.
  
  # --------- SOME DEFINITIONS ----------
  # The part of the dictionary-url which PRECEDES AN EXPRESSION to search
  DICT=https://www.merriam-webster.com/dictionary/
  # The command to extract a fragment from a page
  EXTR='puts $_.at_css("div#definition-wrapper")'
  
  # other dictionary
  #DICT=http://www.conjugaison.com/conjuguer.php?verbe=
  #EXTR='puts $_.at_xpath("//table")'
  
  # The browser to pipe the result to
  # BROWSER='/usr/bin/lynx'
  BROWSER='/usr/bin/w3m'
  
  # a temporary HTML-file
  FL=`mktemp --suffix=.html`
  
  # --------> ACTION <---------
  # Extract the interesting part of the search result, make HTML.
  echo "<html><body>`curl -s -L "$DICT""$1" | nokogiri -e "$EXTR"`</body/></html>" > $FL
  # <-------- END ACTION --------->
  
  # Display
  $BROWSER $FL
  # remove temporary file
  unlink "$FL"
  
Ω