Search in paragraph's text: Difference between revisions
Content added Content deleted
(julia example) |
No edit summary |
||
Line 1: | Line 1: | ||
{{draft task}} |
{{draft task}} |
||
The goal is to verify the presence of a word or regular expression within several paragraphs of text (structured or not) and to |
The goal is to verify the presence of a word or regular expression within several paragraphs of text (structured or not) and to format the output of the relevant paragraphs before putting them on the standard output. |
||
So here, let’s imagine that we are trying to verify the presence of a keyword "SystemError" within the paragraphs "Traceback (most recent call last):" in the file Traceback.txt |
So here, let’s imagine that we are trying to verify the presence of a keyword "SystemError" within what I want to call "the paragraphs" "Traceback (most recent call last):" in the file Traceback.txt |
||
<pre> |
<pre> |
||
Line 78: | Line 78: | ||
</pre> |
</pre> |
||
The expected result must be |
The expected result must be formated with ---------------- for paragraph's separator AND "Traceback (most recent call last):" as the beginning of each relevant's paragraph : |
||
<pre> |
<pre> |
||
Line 117: | Line 117: | ||
Using the awk "Record Separator" : |
Using the awk "Record Separator" : |
||
<lang awk> |
<lang awk> |
||
awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" } |
|||
awk -v ORS='\n\n' '/SystemError/ { print RS $0 }' RS="Traceback" Traceback.txt |\ |
|||
/Traceback/ && /SystemError/ |
|||
awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n" |
|||
{ print substr($0,index($0,"Traceback")) }' Traceback.txt |
|||
</lang> |
</lang> |
||
Note : RS is modified from "\n" to "" in order to use Traceback information in index function. |
|||
Note : 1st awk is used to search for an expression (regular or not) within paragraphs, and 2nd awk is used for formatting |
|||
ORS is modified to seperate the paragraphs with "\n----------------\n" |
|||
Each paragraph must contains "Traceback" and "SystemError" |
|||
substr is extracting only characters after "Traceback" appearance, until the next matching "Traceback" and "SystemError". |
|||
=={{header|J}}== |
=={{header|J}}== |