Similar to applying sanity checks to external data sources, it is important to check that data generated internally is consistent, and does not introduces errors or bugs.
In many cases, one would write custom code to merge data attributes into new features. The code written for such operations needs to be unit-tested in order to ensure that it does not introduces functional bugs, but also to ensure that the returned data will match the expected values needed to train a machine learning algorithm.
Failing to test the feature extraction code may lead to unintended bugs with severe impact on the final model. Such bugs are hard to detect and remove because they involve several data sources and functionality.
If automatically extracted features are used, they should be tested for correctness.