Yahoo! UI Library

text  3.3.0

Yahoo! UI Library > text > Text.WordBreak
Search:
 
Filters

static Class Text.WordBreak

Provides utility methods for splitting strings on word breaks and determining whether a character index represents a word boundary, using the generic word breaking algorithm defined in the Unicode Text Segmentation guidelines (Unicode Standard Annex #29).

This algorithm provides a reasonable default for many languages. However, it does not cover language or context specific requirements, and it does not provide meaningful results at all for languages that don't use spaces between words, such as Chinese, Japanese, Thai, Lao, Khmer, and others. Server-based word breaking services usually provide significantly better results with better performance.

Methods

_classify

protected static Array _classify ( string )
Returns a character classification map for the specified string.
Parameters:
string <String> String to classify.
Returns: Array
Classification map.

_isWordBoundary

protected static Boolean _isWordBoundary ( map , index )

Returns true if there is a word boundary between the specified character index and the next character index (or the end of the string).

Note that there are always word breaks at the beginning and end of a string, so _isWordBoundary('', 0) and _isWordBoundary('a', 0) will both return true.

Parameters:
map <Array> Character classification map generated by _classify.
index <Number> Character index to test.

getUniqueWords

static Array getUniqueWords ( string , options )
Returns an array containing only unique words from the specified string. For example, the string 'foo bar baz foo' would result in the array ['foo', 'bar', 'baz'].
Parameters:
string <String> String to split.
options <Object> (optional) Options (see getWords() for details).
Returns: Array
Array of unique words.

getWords

static Array getWords ( string , options )
Splits the specified string into an array of individual words.
Parameters:
string <String> String to split.
options <Object> (optional) Options object containing zero or more of the following properties:
ignoreCase (Boolean)
If true, the string will be converted to lowercase before being split. Default is false.
includePunctuation (Boolean)
If true, the returned array will include punctuation characters. Default is false.
includeWhitespace (Boolean)
If true, the returned array will include whitespace characters. Default is false.
Returns: Array
Array of words.

isWordBoundary

static Boolean isWordBoundary ( string , index )

Returns true if there is a word boundary between the specified character index and the next character index (or the end of the string).

Note that there are always word breaks at the beginning and end of a string, so isWordBoundary('', 0) and isWordBoundary('a', 0) will both return true.

Parameters:
string <String> String to test.
index <Number> Character index to test within the string.
Returns: Boolean
true for a word boundary, false otherwise.


Copyright © 2011 Yahoo! Inc. All rights reserved.