MATLAB: Quickly sanitize a string array for use as table variables

For my genomics project, I’ve been generating a lot of MATLAB tables where each column represents variants at a specific gene loci. The problem is, many gene names do not conform to MATLAB’s rules for what constitutes a valid column variable identifier.

After a bit of digging I found there are actually two MATLAB methods that can be used to automate this process.

  matlab.lang.makeValidName()

  matlab.lang.makeUniqueStrings()

Together, these methods can fully sanitize any list I throw at them. For example…

T = array2table(randi(6,6))

    Var1    Var2    Var3    Var4    Var5    Var6
    ____    ____    ____    ____    ____    ____
     4       2       1       4       5       1  
     3       5       5       5       3       6  
     2       5       2       4       5       4  
     2       5       5       3       3       1  
     2       3       3       3       5       1  
     3       6       2       5       6       3  

badList = {'1var' '1var' ' 1var' 'my var' 'v#3' 'alpha and Ω'}

okList = matlab.lang.makeValidName(shitList)

validList = matlab.lang.makeUniqueStrings(okList)

T.Properties.VariableNames = validList;

    x1var    x1var_1    x1var_2    myVar    v_3    alphaAnd_
    _____    _______    _______    _____    ___    _________
      4         2          1         4       5         1    
      3         5          5         5       3         6    
      2         5          2         4       5         4    
      2         5          5         3       3         1    
      2         3          3         3       5         1    
      3         6          2         5       6         3