1. 27 Apr, 2021 1 commit
  2. 22 Apr, 2021 2 commits
  3. 16 Apr, 2021 1 commit
  4. 13 Apr, 2021 5 commits
  5. 12 Apr, 2021 3 commits
  6. 09 Apr, 2021 6 commits
  7. 08 Apr, 2021 1 commit
  8. 06 Apr, 2021 1 commit
  9. 02 Apr, 2021 6 commits
  10. 31 Mar, 2021 1 commit
    • Cyril SIX's avatar
      Big simplification of get_loop_info · fe7a71c2
      Cyril SIX authored
      Another remnant of trying to devise a complicated algorithm for a
      problem that was, in fact, very simple: I just had to check whether the
      branch was within the loop body.
      
      I tested it functionally on the benchmarks: only heapsort is changed, in
      slightly worst (4-5%), because the old get_loop_info had done a buggy
      guess that proved to be lucky for that particular case.
      
      The other benchmarks are unchanged: the predictions stay the exact same.
      
      The get_loop_info could potentially be improved by having a natural loop
      detection that extends to outer loops (not just inner loops), though I
      expect the performance improvements would be very small.
      fe7a71c2
  11. 30 Mar, 2021 4 commits
  12. 29 Mar, 2021 2 commits
    • Cyril SIX's avatar
      Simplifications on Linearize - details below · 67cfb5b6
      Cyril SIX authored
      While I was developing the new trace linearize, I started off with
      implementing a big algorithm reasoning on dependencies etc.., but I
      realized later that it was giving a too different performance (sometimes
      better, sometimes worst) than the original CompCert. So I
      stripped it off gradually until its performance (on regular code with
      just branch prediction) was on par with the base Linearize of CompCert.
      
      I was aiming here for something that is either equal, or better, in
      terms of performance.
      
      My (then and current) theory is that I have stripped it out so much that
      now it's just like the algorithm of CompCert, but with a modification
      for Lcond instructions (see the new linearize_aux_cb). However, I never
      tested that theory: the code worked, so I left it as is, without any
      simplification. But now that I need to get a clear version for my
      manuscript, I'm digging into it.
      
      It turns out my theory is not really exact.
      A difference is that instead of taking the minpc across the chain, I take
      the pc of the very first block of the chain I create. This was (I think)
      out of laziness in the middle of two iterations, except that I forgot
      about it.
      
      I tested my new theory by deleting all the stuff about dependencies
      calculation (commited), and also computing a minpc just like original
      compcert (not commited): I get the same exact Mach code than
      linearize_aux_cb.
      
      So right now, the only difference between linearize_aux_cb and
      linearize_aux_trace is that slightly different minpc computation.
      
      I think transitionning to linearize_aux_cb will be 1) much clearer than
      this Frankenstein monster of linearize_aux_trace that I made, and 2)
      might be better performing too.
      
      I don't have access to Kalray machines today so i'm leaving this on hold
      for now, but tomorrow I will test performance wise to see if there is a
      regression. If there isn't, I will commit this (and it will be the
      version narrated by my manuscript).
      
      If there is a regression, it would mean selecting the pc of the first
      node (in opposition to the minpc) is more performant, so i'd backtrack
      the change to linearize_aux_cb anyway and there should then be 0
      difference in the generated code.
      67cfb5b6
    • Léo Gourdin's avatar
      adding test for load replacement on a64 · 706b6384
      Léo Gourdin authored
      706b6384
  13. 26 Mar, 2021 4 commits
  14. 23 Mar, 2021 1 commit
  15. 21 Mar, 2021 1 commit
  16. 10 Mar, 2021 1 commit