We illustrate several different ways we can separate the following string into substrings.
i1 : s = "This is an example of a string.\nIt contains some letters, spaces, and punctuation.\r\nIt also contains some new line characters.\r\nIn fact, for some reason, both Unix-style\nand Windows-style\r\nnew line characters are present." o1 = This is an example of a string. It contains some letters, spaces, and punctuation. It also contains some new line characters. In fact, for some reason, both Unix-style and Windows-style new line characters are present. |
The command separate(s) breaks s at every occurrence of "\r\n" or "\n".
i2 : separate(s) o2 = {This is an example of a string., It contains some letters, spaces, and ------------------------------------------------------------------------ punctuation., It also contains some new line characters., In fact, for ------------------------------------------------------------------------ some reason, both Unix-style, and Windows-style, new line characters are ------------------------------------------------------------------------ present.} o2 : List |
This is equivalent to using the lines function.
i3 : lines s o3 = {This is an example of a string., It contains some letters, spaces, and ------------------------------------------------------------------------ punctuation., It also contains some new line characters., In fact, for ------------------------------------------------------------------------ some reason, both Unix-style, and Windows-style, new line characters are ------------------------------------------------------------------------ present.} o3 : List |
Instead of breaking at new line characters, we can specify which character to break at. For instance, we can separate at every comma:
i4 : separate(",", s) o4 = {This is an example of a string., spaces, and punctuation. It contains some letters It also contains some new In fact ------------------------------------------------------------------------ , for some reason, both Unix-style } line characters. and Windows-style new line characters are present. o4 : List |
or at every space:
i5 : separate(" ", s) o5 = {This, is, an, example, of, a, string., contains, some, letters,, It ------------------------------------------------------------------------ spaces,, and, punctuation., also, contains, some, new, line, It ------------------------------------------------------------------------ characters., fact,, for, some, reason,, both, Unix-style, Windows-style, In and new ------------------------------------------------------------------------ line, characters, are, present.} o5 : List |
In the last two examples we can see line breaks appear in the output substrings, since we are no longer separating at them. (They are printed in the console as actual new lines, not using escape characters.)
Now let’s try breaking at the string "om". This occurs three times in our string (in three uses of the word "some"), so s is separated into four substrings. The separating characters "om" do not appear in any of the substrings.
i6 : t = separate("om", s) o6 = {This is an example of a string., e letters, spaces, and punctuation., It contains s It also contains s ------------------------------------------------------------------------ e new line characters., e reason, both Unix-style } In fact, for s and Windows-style new line characters are present. o6 : List |
We can recover the original string using the demark function.
i7 : demark("om", t) o7 = This is an example of a string. It contains some letters, spaces, and punctuation. It also contains some new line characters. In fact, for some reason, both Unix-style and Windows-style new line characters are present. |
In general, s = demark(x, separate(x, s)). The exception to this rule is that demark("\n", separate(s)) isn’t necessarily equal to s; this code will replace any "\r\n" line breaks in s with "\n" characters.
To use a string longer than 2 characters to separate, and for much greater flexibility and control in specifying separation rules, see separateRegexp.