1. perfectly clear. thanks!

2. my 2nd question has more to do with impossible values. the example i gave was that of MPG. my research told me that an MPG above 60 for a fleet vehicle is very very unlikely. comparing this finding/assumption with the distribution (percentiles) of MPG in the dataset led to the conclusion that those values outside the range can only be data entry errors.
i agree that discarding values in the tails of the distribution can't be applied all the time.

3. thanks again for the explanation!!!

I just bought your book - "Data Mining Techniques, 2nd edition" last month, and I'm finding it very very good!!!
Datalligence