Search in paragraph's text: Difference between revisions

Content added Content deleted
(julia example)
No edit summary
Line 1: Line 1:
{{draft task}}
{{draft task}}


The goal is to verify the presence of a word or regular expression within several paragraphs of text (structured or not) and to print the relevant paragraphs on the standard output.
The goal is to verify the presence of a word or regular expression within several paragraphs of text (structured or not) and to format the output of the relevant paragraphs before putting them on the standard output.


So here, let’s imagine that we are trying to verify the presence of a keyword "SystemError" within the paragraphs "Traceback (most recent call last):" in the file Traceback.txt
So here, let’s imagine that we are trying to verify the presence of a keyword "SystemError" within what I want to call "the paragraphs" "Traceback (most recent call last):" in the file Traceback.txt


<pre>
<pre>
Line 78: Line 78:
</pre>
</pre>


The expected result must be (with ---------------- for paragraphs matched's sep) :
The expected result must be formated with ---------------- for paragraph's separator AND "Traceback (most recent call last):" as the beginning of each relevant's paragraph :


<pre>
<pre>
Line 117: Line 117:
Using the awk "Record Separator" :
Using the awk "Record Separator" :
<lang awk>
<lang awk>
awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
awk -v ORS='\n\n' '/SystemError/ { print RS $0 }' RS="Traceback" Traceback.txt |\
/Traceback/ && /SystemError/
awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n"
{ print substr($0,index($0,"Traceback")) }' Traceback.txt
</lang>
</lang>
Note : RS is modified from "\n" to "" in order to use Traceback information in index function.
Note : 1st awk is used to search for an expression (regular or not) within paragraphs, and 2nd awk is used for formatting
ORS is modified to seperate the paragraphs with "\n----------------\n"
Each paragraph must contains "Traceback" and "SystemError"
substr is extracting only characters after "Traceback" appearance, until the next matching "Traceback" and "SystemError".


=={{header|J}}==
=={{header|J}}==