Investigating memory prefetcher performance over parallel applications: From real to simulated