[ Home  |  FAQ-Related Q&As  |  General Q&As  |  Answered Questions ]


    Search the Q&A Archives


File A (osuemail) is a data file with duplicate keys. ...

<< Back to: comp.lang.awk FAQ

Question by Richard
Submitted on 2/10/2005
Related FAQ: comp.lang.awk FAQ
Rating: Not yet rated Rate this question: Vote
File A (osuemail) is a data file with duplicate keys.

File B (dup.out) is a list file of only the duplicate keys.

My awk script is supposed to remove the duplicate keys from file A using file B to tell it which ones to remove.

awk -F: '
BEGIN {
getline dup_ssn < "dup.out"
}

{
if (dup_ssn != $3) print "1", dup_ssn, $3, $8
else if (dup_ssn == $3 && (substr($8,1,1) ~ /C/ || substr($8,1,3) == 500) ) {
#print "3"
getline dup_ssn < "dup.out"
}
else print "2", dup_ssn, $3, $8
}' /tmp/osuemail | wc -l

All files have been sorted in key sequence.

There are 276,990 records in file A.
There are 81 duplicates in file B.

After numerous adjustments to the awk script I still get a count of 276,970 in the new file C (not shown in the script).

Even this is not right because in analysing the data in files A, B and C I find that there are 43 duplicates not removed and 38 duplicates removed.

It looks like a sequencing problem to me but I supposedly took that out of the equation by sorting all the files.

It found the first 8 duplicates.
Missed the next 2 duplicates.
Found the next duplicate.
Missed the next duplicate.
Found the next duplicate.
Missed the next duplicate.
Found the next duplicate.
Missed the next 3 duplicates.
Found the next 2 duplicates.
Missed the next duplicate.
Found the next 2 duplicates.
Missed the next 3 duplicates.
Found the next duplicate.
Missed the next duplicate.
Found the next duplicate.
Missed the next 3 duplicates.
Found the next duplicate.
Missed the next duplicate.
Found the next duplicate.
Missed the next 5 duplicates.
Found the next duplicate.
Missed the next 2 duplicates.
etc.

Is it a timing problem or what?


Your answer will be published for anyone to see and rate.  Your answer will not be displayed immediately.  If you'd like to get expert points and benefit from positive ratings, please create a new account or login into an existing account below.


Your name or nickname:
If you'd like to create a new account or access your existing account, put in your password here:
Your answer:

FAQS.ORG reserves the right to edit your answer as to improve its clarity.  By submitting your answer you authorize FAQS.ORG to publish your answer on the WWW without any restrictions. You agree to hold harmless and indemnify FAQS.ORG against any claims, costs, or damages resulting from publishing your answer.

 

FAQS.ORG makes no guarantees as to the accuracy of the posts. Each post is the personal opinion of the poster. These posts are not intended to substitute for medical, tax, legal, investment, accounting, or other professional advice. FAQS.ORG does not endorse any opinion or any product or service mentioned mentioned in these posts.

 

<< Back to: comp.lang.awk FAQ


[ Home  |  FAQ-Related Q&As  |  General Q&As  |  Answered Questions ]

© 2008 FAQS.ORG. All rights reserved.