Dallycot

A linked open code engine

View project on GitHub

Linguistics Library

The functions in this library provide simple linguistics processing capabilities.

Functions

clinical-context

(concept: String, sentence: String) →

Determines the context of a concept within a sentence. The returned vector contains the following elements in order:

  1. concept
  2. sentence
  3. negation context (“affirmed”, “negated”, “possible”)
  4. temporality context (“recent”, “hypothetical”, “historical”)
  5. experiencer context (“patient”, “other”)

Examples

clinical-context("pneumonia", "The patient denied a history of pneumonia.") =
< "pneumonia",
  "The patient denied a history of pneumonia.",
  "negated",
  "historical",
  "patient"
>

Implementation

This is implemented internally using Lingua::Context.

classify-text-language

(text: String, languages -> «en») → String

Classifies the text as one of the listed languages. Returns the most likely language.

Implementation

This is implemented internally using Lingua::YALI.

sentences

(text: String) →

Splits a text into a vector of sentences. Uses a list of common abbreviations for the text’s language to avoid breaks in the middle of sentences.

Examples

sentences("The big black bug bit the big black bear. Suzy sold seashells by the sea shore. The lazy dog jumped over the crazy cow.") =
< "The big black bug bit the big black bear.",
  "Suzy sold seashells by the sea shore.",
  "The lazy dog jumped over the crazy cow."
>

Implementation

This is implemented internally using Lingua::Sentence.

stop-words

(language: String) →

Lists the default stop words for the given language. Available languages are listed in stop-word-languages.

Implementation

This is implemented internally using Lingua::StopWords.

Streams

language-classifier-languages

A vector of languages recognized by the language classifier.

Implementation

This is implemented internally.

stop-word-languages

A vector of languages for which stop word lists are available.

Implementation

<<da nl en fi fr de hu it no pt es sv ru>>