Text.pm


NAME

Communiware::Format:Text - preprocess routines which are called on item texts before placing them into database


SYNOPSIS

$text = Communiware::Format::Text::ToInternal($text,{SERVER=>someserver,..}) $text = Communiware::Format::Text::FromInternal($ctx,$text)


DESCRIPTION

This module provides preprocessing functions, used by various posing interfaces to convert data into HTML, which would be stored in the Communiware database

ToInternal

Preprocesses text, entered in REPLY form or in posting form. Arguments: text and reference to the hash of the item attributes.

Hash should contain at least SERVER element, which would be used to resolve cmw: links without explicit site specification.

Adds to the hash element HREF which value is the list of links to other communiware items, found in the text and SRC_REPR with value Text.

Recognizes following formatting:

_word_
converted to <EM>

  • converted to <STRONG>

line break
converted to <BR>

empty line
converted to <P>

valid URL
URLs starting with http,ftp or https are converted to hyperlinks

cmw:item(text)
references to communiware items which looks like cmw:item_id(text) are converted to <A HREF=``/server/item_id''>text</A>. Note. Nested parentheses in text are not handled correctly

Syntax:

ToInternal $text,\%attributes,format

Side effect - replaces value of $attributes{HREF} by list of links to items found in the text

Format can be

spar
Strict paragraphs - resulting html would always start with <P> and end with </P> and can cosist of multiple paragraphs

rpar
Relaxed paragraphs - if text consists of multiple paragraphs, they'll be separated with <P></P>, but surroinding <P>/</P> would never be output. This is default

npar
No paragraps. Multiple consequitive newlines in input would be treated as single newline.

plain
No paragraphs, no hyperlinks, no any other HTML tags.

bare
More plain text than 'plain'. No any tags, no HTML escaping.

Bare format is useful for text analysers and other tools which could be confused by any control character sequences.

return_defined


INTERNAL FUNCTIONS

string2html

string2html - function applying bold/italic transformations correctly It also converts urls into hrefs

text2html

This function applies transformation to whole text


NAME

Communiware::Format::Text - mark up text as HTML


DESCRIPTION

The text2html function marks up plain text as HTML. By default it expands tabs and converts HTML metacharacters into the corresponding entities. More complicated transformations, such as splitting the text into paragraphs or marking up bulleted lists, can be carried out by setting the appropriate options.


SUMMARY OF OPTIONS

These options always apply:

    urls         Convert URLs to links
    email        Convert email addresses to links
    bold         Mark up words with *asterisks* in bold
    emphased    Mark up words with _underscores_ as underlined

You can then choose to treat the text according to one of these options:

    paras        Treat text as paragraph-oriented

The following options apply when the paras option is specified:

    bullets      Mark up bulleted paragraphs as unordered list
    numbers      Mark up numbered paragraphs as ordered list

text2html will issue a warning if it is passed nonsensical options, for example headings but not paras. These warnings can be supressed by setting $Communiware::Format::Text::QUIET to true.


OPTIONS

bold
Words surrounded with asterisks are marked up in bold, so *abc* becomes <B>abc</B>.

emphased
Words surrounded with underscores are marked up with emphased, so _abc_ becomes <U>abc</U>.

urls
Spots Uniform Resource Locators (URLs) in the text and converts them to links. For example
    See https://perl.com/.

becomes

    See <TT><A HREF="https://perl.com/">https://perl.com/</A></TT>.


SEE ALSO

The HTML::Entities module (part of the LWP package) provides functions for encoding and decoding HTML entities.

Tom Christiansen has a complete implementation of RFC 822 structured field bodies. See http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz.

Seth Golub's txt2html utility does everything that HTML::FromText does, and a few things that it would like to do. See http://www.thehouse.org/txt2html/.

RFC 822: ``Standard for the Format of ARPA Internet Text Messages'' describes the syntax of email addresses (the more esoteric features of structured field bodies, in particular quoted-strings, domain literals and comments, are not recognized by HTML::FromText). See ftp://src.doc.ic.ac.uk/rfc/rfc822.txt.

RFC 1630: ``Universal Resource Identifiers in WWW'' lists the protocols that may appear in URLs. HTML::FromText also recognizes ``https:'', but ignores ``file:'' because experience suggests that it results in too many false positives. See ftp://src.doc.ic.ac.uk/rfc/rfc1630.txt.


AUTHOR

Gareth Rees <garethr@cre.canon.co.uk>. It was very much adopted for Communiware and it is necessary to remove extra code.

16 октябрь 2007 13:44