Word-definitions on the command-line

pos

Posted 30th Sep 2017 at 07:31 by Michael Uplawski

Tags html, nokogiri, parse, word-definition

Updates:

Contents

Bash/Ruby script to retrieve on the command-line word-definitions from an online dictionary.

The following resources have inspired this entry and should be helpful to find alternative solutions:

What it does

If the script is called as “define”: define execute definition received for execute

Motivation

I have been using the Leo-script to retrieve translations from and to the German, the French or the English language, wrote a script which helps me conjugate French verbs and another one to retrieve synonyms.

When I replaced LEO by dict, I found the results a little short and the lack of alternative word-propositions downright disturbing. What I needed was the assurance that a proposed translation would also fit in the context of my current translation task.., i.e. hints on the exact meaning of an English or French word.

The Merriam-Webster online-dictionary (amongst others) is filling this gap, but when you call a text-mode browser on a search url (like https://www.merriam-webster.com/dictionary/execute), the head of the page and other graphical elements in the output become obstructive.

Global Solution

Extracting interesting sections from html-pages is part of what my utility Html2Index does already, and so I was happy to find out that the Nokogiri-gem which I use in some of my Ruby-programs, comes with a command-line utility, serving the very same purpose.

Instead of a shell-script, you can also just define an alias-command for your shell. I prefer the script, as it can be called from the Mutt mail-client, whereas the alias-commands are not available to the command-line interface in Mutt.

For displaying the search results which are provided by the online-dictionary, then extracted by Nokogiri, you can use any command-line html-browser; my own choice is w3m.

Alternatively, if you need the output on a graphical window, take a look at yad and its --browser mode. Instead of the html-output, you can choose to just display a raw text-version of the search-results and pipe it to just any editor or pager that you like...

Here is a second picture showing the results with a script that I adapted to the Larousse online-dictionaries: result from Larouse

Preconditions

You must have a ruby-interpreter and the Nokogiri-gem installed. Curl should already be available on your system, otherwise install it with the help of your package-manager. The same applies to W3M or Lynx, the text-mode browsers.

Script

#!/bin/bash
# ©2017-2018 Michael Uplawski <michael.uplawski@uplawski.eu>
# Use ths script at your own risk, modify it as you please.
# But maybe leave the copyright-notice intact. Thank You.

if [ $# == 0 ]
then
  echo -e "ERREUR! Argument manquant!"
  exit 1
fi

# --------- SOME DEFINITIONS ----------
# The Name of the dictionary:
DICT_NAME='Larousse http://www.larousse.fr'
# The part of the dictionary-url which PRECEDES AN EXPRESSION to search
DICT=https://www.larousse.fr/dictionnaires/francais/
# The command to extract a fragment from a page
EXTR='puts $_.at_css("article.content")'

# other dictionary
#DICT_NAME='Conjugaison.com'
#DICT=http://www.conjugaison.com/conjuguer.php?verbe=
#EXTR='puts $_.at_xpath("//article")'

# The browser to pipe the result to
# BROWSER='/usr/bin/lynx'
BROWSER='/usr/bin/w3m'

# a temporary HTML-file
FL=`mktemp --suffix=.html`

# --------> ACTION <---------
# Extract the interesting part of the search result, make HTML.
echo "<html><body><h1>"$DICT_NAME"</h1>`curl -L -m10 -s "$DICT""$1" | nokogiri -e "$EXTR"`</body/></html>" > $FL
# <-------- END ACTION --------->

# Display 
$BROWSER $FL
# remove temporary file
rm -f "$FL"