Unicode polynomial equation

Revision as of 06:46, 21 December 2019 by Wherrera (talk | contribs) (fractions printing)
This task has been flagged for clarification due to it being believed to be too difficult to implement in a reasonable amount of effort in more than one (usually very specialised) language. It may need to be divided into multiple tasks or modified substantially so that multiple implementations are practical, and that may cause code on this page in its current state to be flagged incorrect afterwards. See this page's Talk page for discussion.

The objective of this task is to parse in a difficult polynomial, and generate a "pretty" representation of the polynomial in Unicode.

Unicode polynomial equation is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In the target language define a "polynomial" object (or structure or record). Using this object also define the routines for parsing a polynomial as input, and generating a normalised Unicode representation of the polynomial as output.

Task details:

Given a string containing an untidy Unicode polynomial, e.g.

-0.00x⁺¹⁰ + 1.0·x ** 5 + -2e0x^4 + +0,042.00 × x ⁺³ + +.0x² + 20.000 000 000x¹ - -1x⁺⁰ + .0x⁻¹ + 20.x¹

Coerce (or convert) the string into the "polynomial" object, at the same time normalise the polynomial to a canonical form. The ideal normalised output (in this example) would be:

x⁵ - 2x⁴ + 42x³ + 40x + 1
Specific examples of Unicode and polynomial texts to be parsed as test cases.
Description Input example test cases
"Zero" coefficients are removed x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1
The "0" polynomial case 0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵
"One" coefficients are normalised 1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰
Signs are normalised +x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1
ASCII representations are parsed x^5 - 2x**4 + 42x^3 + 40x + 1
Non-ASCII representations are parsed x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1 (c.f. & ·)
Specifically permit non-polynomials where terms have negative exponents x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰ (n.b. Unicode Fraction)
Spaces in numbers and between operators are ignored x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1
Single commas are ignored in numbers x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1
A coefficient may be duplicated, zero, or missing 0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1
Support Scientific notation and optionally
support Unicode Decimal Exponent Symbol U+23E8/⏨
1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1
Unicode characters that must be specifically supported are: ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁻ ⁺ · × ⁄ ↑ ⏨.
Where · & × are multiplication, and ⁄ is Unicode Fraction.
Support fractions for both input and output. x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2
On output round the decimal to appropriate fraction.
Optionally support Unicode Vulgar fractions for both input and output.
¼ ½ ¾ ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ↉
x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½
On output round the decimal to appropriate fraction.

There are (at least) three possible ways of achieving this task.

  • Using an external parsing library.
  • Using a built-in parsing/formatting library.
  • Coding a custom polynomial parsing routing.

Either one, or all of these approaches are accepted and appear as a subtitle.

Go

Although this program provides full support for Unicode vulgar fractions, note that there is no guarantee that they (or arithmetic on them) will successfully 'round trip' due to floating point arithmetic being used in the underlying calculations and some of them being recurring rather than exact decimals in any case. <lang go>package main

import (

   "fmt"
   "log"
   "math"
   "regexp"
   "strconv"
   "strings"

)

var powers = strings.NewReplacer(

   "0", "⁰",
   "1", "¹",
   "2", "²",
   "3", "³",
   "4", "⁴",
   "5", "⁵",
   "6", "⁶",
   "7", "⁷",
   "8", "⁸",
   "9", "⁹",
   "-", "⁻",

)

var fractions = [][2]string{

   {".25", "¼"},
   {".5", "½"},
   {".75", "¾"},
   {".14285714285714285", "⅐"},
   {".1111111111111111", "⅑"},
   {".1", "⅒"},
   {".3333333333333333", "⅓"},
   {".6666666666666666", "⅔"},
   {".2", "⅕"},
   {".4", "⅖"},
   {".6", "⅗"},
   {".8", "⅘"},
   {".16666666666666666", "⅙"},
   {".8333333333333334", "⅚"},
   {".125", "⅛"},
   {".375", "⅜"},
   {".625", "⅝"},
   {".875", "⅞"},

}

func printEquation(coefs map[int]float64) {

   fmt.Print("=> ")
   if len(coefs) == 0 {
       fmt.Println("0\n")
       return
   }
   max, min := math.MinInt32, math.MaxInt32
   for k := range coefs {
       if k > max {
           max = k
       }
       if k < min {
           min = k
       }
   }
   for p := max; p >= min; p-- {
       if c := coefs[p]; c != 0 {
           if p < max {
               sign := "+"
               if c < 0 {
                   sign = "-"
                   c = -c
               }
               fmt.Printf(" %s ", sign)
           }
           if c != 1 || (c == 1 && p == 0) {
               cs := fmt.Sprintf("%v", c)
               ix := strings.Index(cs, ".")
               if ix >= 0 {
                   dec := cs[ix:]
                   for _, frac := range fractions {
                       if dec == frac[0] {
                           cs = strings.Replace(cs, dec, frac[1], 1)
                           break
                       }
                   }
               }
               if cs[0] == '0' && len(cs) > 1 && cs[1] != '.' {
                   cs = cs[1:]
               }
               fmt.Print(cs)
           }
           if p != 0 {
               ps := strconv.Itoa(p)
               ps = powers.Replace(ps)
               if ps == "¹" {
                   ps = ""
               }
               fmt.Printf("x%s", ps)
           }
       }
   }
   fmt.Println("\n")

}

func check(err error) {

   if err != nil {
       log.Fatal(err)
   }

}

func main() {

   equs := []string{
       `-0.00x⁺¹⁰ + 1.0·x ** 5 + -2e0x^4 + +0,042.00 × x ⁺³ + +.0x² + 20.000 000 000x¹ - -1x⁺⁰ + .0x⁻¹ + 20.x¹`,
       `x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1`,
       `0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵`,
       `1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰`,
       `+x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1`,
       `x^5 - 2x**4 + 42x^3 + 40x + 1`,
       `x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1`,
       `x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰`,
       `x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1`,
       `x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1`,
       `0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1`,
       `1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1`,
       `x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2`,
       `x⁵ - 0.5x⁴ + 101.25x³ + 100.75x + 2.5`,
       `x⁻⁵ - 2⁄x⁴ + 42x⁻³ - 40/x`,
       `⅐x⁵ - ⅓x⁴ - ⅔x⁴ + 42⅕x³ + ⅑x - 40⅛ - ⅝`,
   }
   rgx := regexp.MustCompile(`\s+(\+|-)\s+`)
   rep := strings.NewReplacer(
       ",", "",
       " ", "",
       "¼", ".25",
       "½", ".5",
       "¾", ".75",
       "⅐", ".14285714285714285",
       "⅑", ".1111111111111111",
       "⅒", ".1",
       "⅓", ".3333333333333333",
       "⅔", ".6666666666666666",
       "⅕", ".2",
       "⅖", ".4",
       "⅗", ".6",
       "⅘", ".8",
       "⅙", ".16666666666666666",
       "⅚", ".8333333333333334",
       "⅛", ".125",
       "⅜", ".375",
       "⅝", ".625",
       "⅞", ".875",
       "↉", ".0",
       "⏨", "e",
       "⁄", "/",
   )
   rep2 := strings.NewReplacer(
       "⁰", "0",
       "¹", "1",
       "²", "2",
       "³", "3",
       "⁴", "4",
       "⁵", "5",
       "⁶", "6",
       "⁷", "7",
       "⁸", "8",
       "⁹", "9",
       "⁻⁻", "",
       "⁻", "-",
       "⁺", "",
       "**", "",
       "^", "",
       "↑", "",
       "⁄", "/",
   )
   var err error
   for _, equ := range equs {
       fmt.Println(equ)
       terms := rgx.Split(equ, -1)
       ops := rgx.FindAllString(equ, -1)
       for i := 0; i < len(ops); i++ {
           ops[i] = strings.TrimSpace(ops[i])
       }
       coefs := make(map[int]float64)
       for i, term := range terms {
           s := strings.Split(term, "x")
           t := s[0]
           t = strings.TrimRight(t, "·× ")
           t = rep.Replace(t)
           c := 1.0
           inverse := false
           if t != "" {
               if t == "+" || t == "-" {
                   t += "1"
               }
               ix := strings.Index(t, "/")
               if ix == len(t)-1 {
                   inverse = true
                   t = t[0 : len(t)-1]
                   c, err = strconv.ParseFloat(t, 64)
                   check(err)
               } else if ix >= 0 {
                   u := strings.Split(t, "/")
                   m, err := strconv.ParseFloat(u[0], 64)
                   check(err)
                   n, err := strconv.ParseFloat(u[1], 64)
                   check(err)
                   c = m / n
               } else {
                   c, err = strconv.ParseFloat(t, 64)
                   check(err)
               }
               if i > 0 && ops[i-1] == "-" {
                   c = -c
               }
               if c == -0.0 {
                   c = 0
               }
           }
           if len(s) == 1 {
               coefs[0] += c
               continue
           }
           u := s[1]
           u = strings.TrimSpace(u)
           if u == "" {
               p := 1
               if inverse {
                   p = -1
               }
               if c != 0 {
                   coefs[p] += c
               }
               continue
           }
           u = rep2.Replace(u)
           jx := strings.Index(u, "/")
           p := 1
           if jx >= 0 {
               v := strings.Split(u, "/")
               p, _ = strconv.Atoi(v[0])
               d, err := strconv.ParseFloat(v[1], 64)
               check(err)
               c /= d
           } else {
               p, _ = strconv.Atoi(strings.TrimSpace(u))
           }
           if inverse {
               p = -p
           }
           if c != 0 {
               coefs[p] += c
           }
       }
       printEquation(coefs)
   }

}</lang>

Output:
-0.00x⁺¹⁰ + 1.0·x ** 5 + -2e0x^4 + +0,042.00 × x ⁺³ + +.0x² + 20.000 000 000x¹ - -1x⁺⁰ + .0x⁻¹ + 20.x¹
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵
=> 0

1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

+x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x^5 - 2x**4 + 42x^3 + 40x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰
=> 1 + 40x⁻¹ + 42x⁻³ - 2x⁻⁴ + x⁻⁵

x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1
=> x⁵ - 2x⁴ + 42x³ + 40x + 1

x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2
=> x⁵ + ½x⁴ + 101¼x³ + 103¼

x⁵ - 0.5x⁴ + 101.25x³ + 100.75x + 2.5
=> x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½

x⁻⁵ - 2⁄x⁴ + 42x⁻³ - 40/x
=> -40x⁻¹ + 42x⁻³ - 2x⁻⁴ + x⁻⁵

⅐x⁵ - ⅓x⁴ - ⅔x⁴ + 42⅕x³ + ⅑x - 40⅛ - ⅝
=> ⅐x⁵ - x⁴ + 42⅕x³ + ⅑x - 40¾


Julia

The task allows the "polynomials" to be parsed to have negative exponents. This makes them Laurent polynomials, not ordinary polynomials. <lang julia>import Base.print

struct LaurentPolynomial{T}

   powtocoef::Dict{Int,T}
   varname::Char

end

function tosuper(i)

   s = ""
   if i != 1
       if i < 0
           s = "\u207b"
           i = -i
       end
       s *= prod([reverseudi[x] for x in reverse(digits(i))])
   end
   return s

end

function print(io::IO, lp::LaurentPolynomial)

   if isempty(lp.powtocoef) || all(iszero, values(lp.powtocoef))
       print(0.0)
       return
   end
   firstterm = true
   dorev = minimum(keys(lp.powtocoef)) >= 0
   for p in sort!(collect(lp.powtocoef), lt =(a,b)->a[1]<b[1], rev=dorev)
       e, c = p[1], p[2]
       if c != 0
           if !firstterm
               print(c < 0 ? " - " : " + ")
               c = abs(c)
           end
           sfrac = get(tofracs, c - trunc(c), "")
           sint = string(Int(trunc(c)))
           s = round(c) == c ? sint : sfrac == "" ? string(c) : 
               sint == "0" ? sfrac : sint * sfrac
           print(c == 1 && e != 0 ? "" : s)
           e != 0 && print(lp.varname, tosuper(e))
           firstterm = false
       end
   end

end

const uniexp = [['\u2070', '\u00b9', '\u00b2', '\u00b3']; collect('\u2074':'\u2079')] const allsuper = String(vcat(uniexp, ['\u207a', '\u207b'])) const udi = Dict([u => i-1 for (i, u) in enumerate(uniexp)]) const reverseudi = Dict([v => k for (k, v) in udi]) const uch = Dict(['\u207a' => '+', '\u207b' => '-', '\u00b7' => ' ',

   '\u00d7' => ' ', '\u2091' => '^', '\u23e8' => 'E'])

const fracs = Dict(['¼' => "$(1/4)", '½' => "$(1/2)", '¾' => "$(3/4)", '⅐' => "$(1/7)",

   '⅑' => "$(1/9)", '⅒' => "$(1/10)", '⅓' => "$(1/3)", '⅔' => "$(2/3)", '⅕' => "$(1/5)",
   '⅖' => "$(2/5)", '⅗' => "$(3/5)", '⅘' => "$(4/5)", '⅙' => "(1/6)", '⅚' => "$(5/6)",
   '⅛' => "$(1/8)", '⅜' => "$(3/8)", '⅝' => "$(5/8)", '⅞' => "$(7/8)", '↉' => "$(0/3)"])

const tofracs = Dict(0.25 => '¼', 0.5 => '½', 0.75 => '¾', 0.2 => '⅕') allfrac = join(vcat(collect(keys(fracs))), "|")

utoascii(c) = (x = haskey(udi, c) ? Char('0' + udi[c]) : haskey(uch, c) ? uch[c] : c)

function fcoef(termstring, varname)

   m = match(r"(\d+)⁄(\d+)", termstring).captures
   return "$(parse(Float64, m[1])/parse(Float64, m[2]))$varname^0"

end

function xcoef(termstring, varname)

   m = match(Regex("(\\d+)$varname([^⁄\\+]*)⁄(\\d+)"), termstring).captures
   coe = parse(Float64, m[1])/parse(Float64, m[3])
   return "$(coe)$varname$(m[2])"

end

function fromvfrac(termstring)

   m = match(r"(\d*)(\D+)", termstring).captures
   s = string((m[1] == "" ? 0 : parse(Float64, m[1])) + parse(Float64, fracs[m[2][1]]))

end

function normalizeexpression(s, varname='x')

   s = replace(s, Regex("\\d*(" * allfrac * ")") => (x) -> fromvfrac(x))
   s = replace(replace(s, r"[^\+].+" => (x) -> "+$x"), r".+[^\+]$" => (x) -> "$x+")
   s = replace(replace(replace(s, r"\s+" => ""), r"\*\*|↑" => "^"), "\u23e8" => "e")
   s = replace(s, Regex(varname * "(?=[$allsuper]+)") => "$varname^")
   s = replace(prod([utoascii(c) for c in s]), r"\s+" => "")
   s = replace(s, r"[\-\+]+" => (x) -> isodd(count(y -> y == '-', x)) ? "+-" : "+")
   s = replace(replace(s, r"\^\+\-" => "^-"), r"\^\+" => "^")
   s = replace(s, Regex("(?<=[\\-\\+])$varname") => "1$varname")
   s = replace(s, r"(\d+)x([^⁄\+]*)⁄(\d+)" => (x) -> xcoef(x, varname))
   s = replace(s, r"(?<=[0-9])(,)(?=[0-9])" => "")
   s = replace(s, r"\d+⁄\d+" => (x) -> fcoef(x, varname))
   s = replace(s, r"\+([\d\.]+)\+" => SubstitutionString("+\\1$varname^0+"))
   return replace(s, Regex("(⁄|/)(" * varname * ")\\^?") =>
       (x) -> varname * (x[end] == '^' ? "^-" : "^-1"))

end

function topoly(s::String, varname='x')

   allcoef = Dict{Int,Float64}(0 => 0.0)
   s = normalizeexpression(s, varname)
   reg = Regex("[\\.\\-\\+\\deE\\/]*" * varname * "(?:\\^[\\d\\-]+)?(?=(?:[^\\+\\-]?(?:\\+|\\-)))")
   matched = collect(eachmatch(reg, s))
   pairs = [split(x.match, varname * "^") for x in matched]
   for p in pairs
       p[1] = replace(p[1], "+-" => "-")
       if length(p) == 1
           push!(p, "1")
           p[1] = replace(p[1], Regex(varname * "\$") => "")
       end
       coef, expo = tryparse(Float64, p[1]), tryparse(Int, p[2])
       coef = coef == nothing ? 1.0 : coef
       expo = expo == nothing ? 1 : expo
       if haskey(allcoef, expo)
           coef += allcoef[expo]
       end
       allcoef[expo] = coef
   end
   return LaurentPolynomial(allcoef, varname)

end

testcases = [ "1x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1", "0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵", "1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰", "+x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1", "x^5 - 2x**4 + 42x^3 + 40x + 1", "x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1", "x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰", "x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1", "x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1", "0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1", "1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1", "x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2", "x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½", ]

for s in testcases

   println(lpad(s, 48), "  =>  ", topoly(s, 'x'))

end

</lang>

Output:
                1x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵  =>  0.0
                    1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
              +x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
                   x^5 - 2x**4 + 42x^3 + 40x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
          x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
                x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰  =>  x⁻⁵ - 2x⁻⁴ + 42x⁻³ + 40x⁻¹ + 1
               x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
            x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
     0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
 1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1  =>  x⁵ - 2x⁴ + 42x³ + 40x + 1
              x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2  =>  x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½
                  x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½  =>  x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½


Phix

To simplify this task I first created a test file (save as utf8, Unicode_polynomial_equation.txt):

-0.00x⁺¹⁰ + 1.0·x ** 5 + -2e0x^4 + +0,042.00 × x ⁺³ + +.0x² + 20.000 000 000x¹ - -1x⁺⁰ + .0x⁻¹ + 20.x¹
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
===> x^5 - 2x^4 + 42x^3 + 40x + 1
x⁵ - 2x⁴ + 42x³ + 0x² + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
0e+0x⁰⁰⁷ + 00e-00x + 0x + .0x⁰⁵ - 0.x⁴ + 0×x³ + 0x⁻⁰ + 0/x + 0/x³ + 0x⁻⁵
==> 0
1x⁵ - 2x⁴ + 42x³ + 40x + 1x⁰
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
+x⁺⁵ + -2x⁻⁻⁴ + 42x⁺⁺³ + +40x - -1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x^5 - 2x**4 + 42x^3 + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x↑5 - 2.00·x⁴ + 42.00·x³ + 40.00·x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x⁻⁵ - 2⁄x⁴ + 42x⁻³ + 40/x + 1x⁻⁰
==> 1 + 40x⁻¹ + 42x⁻³ - 2x⁻⁴ + x⁻⁵
===> 1 + 40x^-1 + 42x^-3 - 2x^-4 + x^-5
x⁵ - 2x⁴ + 42.000 000x³ + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x⁵ - 2x⁴ + 0,042x³ + 40.000,000x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
0x⁷ + 10x + 10x + x⁵ - 2x⁴ + 42x³ + 20x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
1E0x⁵ - 2,000,000.e-6x⁴ + 4.2⏨1x³ + .40e+2x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x⁵ - x⁴⁄2 + 405x³⁄4 + 403x⁄4 + 5⁄2
==> x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½
===> x^5 - 0.5x^4 + 101.25x^3 + 100.75x + 2.5
x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½
==> x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½
x<sup>5</sup> - 2x<sup>4</sup> + 42x<sup>3</sup> + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x^5 - 2x^4 + 42x^3 + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x⁵ - 2x⁴ + 42x³ + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1
x5 - 2x4 + 42x3 + 40x + 1
==> x⁵ - 2x⁴ + 42x³ + 40x + 1

Explanation:
On finding a line beginning with "==> ", parse the previous line and check that the output matches the rest of the line.
Lines beginning "===> " contain alternative (valid) representations of the previous line, for use when uni_frac is false. Note these equivalences can then be used further on, eg the result on line 5 maps to line 3 via 2, not that it matters should the same equivalent be defined twice, and also note that "==> " should always be the unicode version and "===> " the ascii one.

Obviously this is not well tested and //will// fail on the next thing thrown at it, but it shows the basic approach. Error handling omitted using that good ol' standby excuse "for clarity".

Note that space is not skipped when getting exponents, otherwise it might wrongly treat say "x + 2" as "x^2", since we /are/ allowing exponentiation symbols to be omitted. At the start of parse() we strip spaces around '^' (etc) to help with that.

Processing the input in utf32 form is a complete no-brainer; whereas constructing utf8 output is a little more subtle, but avoids creating a utf32 "string" just because all bytes happen to be less than 255, which would not necessarily be valid utf8/32.

<lang Phix>-- demo\rosetta\Unicode_polynomial_equation.exw constant uni_frac = false -- if true output unicode superscripts and vulgar fractions

constant UTF8BOM = {#EF,#BB,#BF},

        SPTWO = #00B2,         -- superscript two
        STHRE = #00B3,         -- superscript three
        MIDOT = #00B7,         -- middle dot
        SPONE = #00B9,         -- superscript one
        QUART = #00BC,         -- one quarter
        AHALF = #00BD,         -- one half
        THQTR = #00BE,         -- three quarters
        MULTY = #00D7,         -- multiplication sign
        DIVDE = #2044,         -- division sign
        SZERO = #2070,         -- superscript zero
        SFOUR = #2074,         -- superscript four
        SFIVE = #2075,         -- superscript five
        SPSIX = #2076,         -- superscript six
        SSEVN = #2077,         -- superscript seven
        SEGHT = #2078,         -- superscript eight
        SNINE = #2079,         -- superscript nine
        SPLUS = #207A,         -- superscript plus
        SMNUS = #207B,         -- superscript minus
        SVNTH = #2150,         -- one seventh
        NINTH = #2151,         -- one ninth
        TENTH = #2152,         -- one tenth
        THIRD = #2153,         -- one third
        TWTHD = #2154,         -- two thirds
        FIFTH = #2155,         -- one fifth
        TWFTH = #2156,         -- two fifths
        THFTH = #2157,         -- three fifths
        FRFTH = #2158,         -- four fifths
        SIXTH = #2159,         -- one sixth
        FVSIX = #215A,         -- five sixths
        EIGTH = #215B,         -- one eigth
        THEGH = #215C,         -- three eigths
        FVEGH = #215D,         -- five eigths

-- ZTHRD = #2189, -- zero thirds[??]

        UPARW = #2191,         -- uparrow
        BASET = #23E8,         -- base 10
        SPACE = ' ',           -- space
            T = 10,            -- align nxt tbl

$

constant {vulgar_fractions,unicode_vulgar_fractions} = columnize({{{1,4},QUART},

                                                                 {{1,2},AHALF},
                                                                 {{3,4},THQTR},
                                                                 {{1,7},SVNTH},
                                                                 {{1,9},NINTH},
                                                                 {{1,T},TENTH},
                                                                 {{1,3},THIRD},
                                                                 {{2,3},TWTHD},
                                                                 {{1,5},FIFTH},
                                                                 {{2,5},TWFTH},
                                                                 {{3,5},THFTH},
                                                                 {{4,5},FRFTH},
                                                                 {{1,6},SIXTH},
                                                                 {{5,6},FVSIX},
                                                                 {{1,8},EIGTH},
                                                                 {{3,8},THEGH},
                                                                 {{5,8},FVEGH}})

constant EXPONENTS = {SZERO,SPONE,SPTWO,STHRE,SFOUR,SFIVE,SPSIX,SSEVN,SEGHT,SNINE}

function skip(sequence s, integer sdx, sequence set)

   while sdx<=length(s) 
     and find(s[sdx],set) do
       sdx += 1
   end while
   return sdx

end function

function get_sign(sequence s, integer sdx, bool allow_superscripts) integer sgn = +1, ch

   for sdx=sdx to length(s) do
       ch = s[sdx]
       if allow_superscripts then
           ch = iff(ch=SPLUS?'+':
                iff(ch=SMNUS?'-':
                iff(ch=SPACE?'?':ch))) -- (do not skip spaces, see note)
       end if
       if ch!='+'
       and ch!=' ' then
           if ch!='-' then exit end if
           sgn *= -1
       end if
   end for
   return {sgn,sdx}

end function

function get_num(sequence s, integer sdx, atom n=0, bool allow_superscripts=false, as_fraction=false) integer sgn = +1, ch, f, e10, d = 1 atom p10 = 10 bool none = not as_fraction -- (cope with "x" == "1x^1" and != "0x^0")

                           -- (but not when processing the "34" of "12.34", obvs)
   if not as_fraction then
       {sgn,sdx} = get_sign(s,sdx,allow_superscripts)
   end if
   if not allow_superscripts then
       sdx = skip(s,sdx,{' '}) 
   end if
   for sdx=sdx to length(s) do
       ch = s[sdx]
       if ch>='0' and ch<='9' then
           if as_fraction then
               n += (ch-'0')/p10
               p10 *= 10
           else
               n = n*10+ch-'0'
           end if
           none = false
       elsif allow_superscripts then
           f = find(ch,EXPONENTS)
           if f=0 then exit end if
           n = n*10+f-1
           none = false
       elsif not find(ch," ,") then
           exit
       end if
   end for
   if not allow_superscripts then
       if find(ch,{'e','E',BASET}) then
           {e10,f,sdx} = get_num(s,sdx+1)
           if f!=1 then ?9/0 end if
           n *= power(10,e10)
       elsif ch='.' then
           if as_fraction then ?9/0 end if
           {n,f,sdx} = get_num(s,sdx+1,n,as_fraction:=true)
           if f!=1 then ?9/0 end if
           none = false
       else
           f = find(ch,unicode_vulgar_fractions)
           if f!=0 then
               if as_fraction then ?9/0 end if
               integer {vfn,vfd} = vulgar_fractions[f]
               if uni_frac then
                   n = n*vfd + vfn
                   if d!=1 then ?9/0 end if
                   d = vfd
               else
                   n += vfn/vfd
               end if
               sdx += 1
               none = false
           end if
       end if
   end if
   if none then n = 1 end if
   n *= sgn
   return {n,d,sdx}

end function

function get_term(sequence s, integer sdx) integer last_sdx = sdx, -- (infinite loop check/prevent)

       e = 0

atom c, d = 1, f bool sdiv = false

   {c,d,sdx} = get_num(s,sdx)
   sdx = skip(s,sdx,{' ',MIDOT,MULTY})
   if sdx<=length(s) 
   and (s[sdx]='/' or
        s[sdx]=DIVDE) then
       sdx += 1
       if sdx<=length(s) and s[sdx]!='x' then
           {d,f,sdx} = get_num(s,sdx)
           if f!=1 then ?9/0 end if
           if not uni_frac then
               c /= d
               d = 1
           end if
       else
           sdiv = true
       end if
   end if
   if sdx<=length(s) and s[sdx]='x' then
       sdx = skip(s,sdx+1,{'^','*',SPLUS,UPARW})
       {e,f,sdx} = get_num(s,sdx,allow_superscripts:=true)
       if f!=1 then ?9/0 end if
       if sdiv then e = -e end if
   else
       if sdiv then ?9/0 end if
   end if
   if sdx<=length(s) 
   and (s[sdx]='/' or
        s[sdx]=DIVDE) then
       if d!=1 then ?9/0 end if
       {d,f,sdx} = get_num(s,sdx+1)
       if f!=1 then ?9/0 end if
       if not uni_frac then
           c /= d
           d = 1
       end if
   end if
   if sdx=last_sdx then ?9/0 end if
   return {e,c,d,sdx}

end function

function unicode_superscripts(integer e) -- display helper string res = ""

   if e>9 then
       res = unicode_superscripts(floor(e/10))
       e = remainder(e,10)
   end if
   res &= utf32_to_utf8({EXPONENTS[e+1]})
   return res

end function

enum EXP, COEF, FRAC -- contents of terms[i]

function poly(sequence terms) -- display helper string r = ""

   for t=length(terms) to 1 by -1 do
       {integer e, atom c, integer f} = terms[t]
       if c!=0 then
           if c=1 and f=1 and e!=0 then
               r &= iff(r=""? "":" + ")
           elsif c=-1 and f=1 and e!=0 then
               r &= iff(r=""?"-":" - ")
           else
               if r!="" then
                   r &= iff(c<0?" - ":" + ")
                   c = abs(c)
               end if
               if f!=1 then    -- (hence/only when uni_frac==true)
                   integer k = find({remainder(c,f),f},vulgar_fractions)
                   if k then
                       c = floor(c/f)
                       if c!=0 then r &= sprintf("%d",c) end if
                       r &= utf32_to_utf8({unicode_vulgar_fractions[k]})
                   else
                       r &= sprintf("%g",c/f)
                   end if
               else
                   r &= sprintf("%g",c)
               end if
           end if
           if e!=0 then
               r &= 'x'
               if e!=1 then
                   if uni_frac then
                       if e<0 then
                           r &= utf32_to_utf8({SMNUS})
                           e = -e
                       end if
                       r &= unicode_superscripts(e)
                   else
                       r &= sprintf("^%d",e)
                   end if
               end if
           end if
       end if
   end for
   if r="" then r="0" end if
   return r

end function

function parse(sequence s) sequence terms = {} integer sdx = 1, e, f atom c

   s = match_replace("",s,"^")
   s = match_replace("",s,"")
   s = match_replace("**",s,"^")
   s = match_replace(" ^",s,"^")
   s = match_replace("^ ",s,"^")
   s = match_replace({' ',SPLUS},s,{SPLUS})
   s = match_replace({' ',SMNUS},s,{SMNUS})
   for i=1 to length(EXPONENTS) do
       e = EXPONENTS[i]
       s = match_replace({' ',e},s,{e})
   end for 
   while sdx<=length(s) do
       {e,c,f,sdx} = get_term(s,sdx)
       if c!=0 then -- (aside: +5 -5 may yet leave c==0)
           terms = append(terms,{e,c,f})
       end if
   end while
   terms = sort(terms)
   -- merge, eg "10x^2 + 10x^2" -> 20x^2
   for i=length(terms) to 2 by -1 do
       if terms[i][EXP] = terms[i-1][EXP] then
           if terms[i-1][FRAC]!=terms[i][FRAC] then
               ?9/0 -- placeholder for more code
           else
               terms[i-1][COEF] += terms[i][COEF]
           end if
           terms[i..i] = {} -- (delete 2nd)
       end if
   end for
   return poly(terms)

end function

sequence alts = {}, -- (unicode versions)

        altn = {},     -- (idx of ascii equivalents)
        lines = read_lines("Unicode_polynomial_equation.txt")

if lines[1][1..3] = UTF8BOM then

   -- remove/ignore any utf8 byte order mark
   lines[1] = lines[1][4..$]

end if

for i=2 to length(lines) do

   if length(lines[i])>5 
   and lines[i][1..5] = "===> " then
       alts = append(alts,lines[i-1][5..$])
       altn = append(altn,i)
   end if

end for

for i=2 to length(lines) do

   if length(lines[i])>4 
   and lines[i][1..4] = "==> " then
       sequence line = utf8_to_utf32(lines[i-1])
       sequence res = parse(line)
       sequence expected = lines[i][5..$]
       if res=expected then
           -- (res is the unicode version)
           if platform()!=WINDOWS or res="0" then
               printf(1,"%2d: %40s   ok\n",{i-1,res})
           else
               -- (unicode output on windows consoles is fiddly...)
               printf(1,"%2d: ok\n",i-1)
           end if
       else
           integer k = find(expected,alts)
           if k and res=lines[altn[k]][6..$] then
               -- (res is the ascii equivalent)
               printf(1,"%2d: %40s   ok\n",{i-1,res})
           else
               printf(1,"%d: error - %s\n",{i-1,res})
           end if
       end if
   end if

end for</lang>

Output:

uni_frac = false

 1:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
 4:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
 6:                                        0   ok
 8:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
10:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
12:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
14:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
16:       1 + 40x^-1 + 42x^-3 - 2x^-4 + x^-5   ok
19:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
21:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
23:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
25:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
27: x^5 - 0.5x^4 + 101.25x^3 + 100.75x + 2.5   ok
30: x^5 - 0.5x^4 + 101.25x^3 + 100.75x + 2.5   ok
32:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
34:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
36:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok
38:             x^5 - 2x^4 + 42x^3 + 40x + 1   ok

uni_frac = true (linux only, unless you have managed to find and install a decent windows console unicode font, which I haven't)

 1:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
 4:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
 6:                                        0   ok
 8:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
10:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
12:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
14:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
16:           1 + 40x⁻¹ + 42x⁻³ - 2x⁻⁴ + x⁻⁵   ok
19:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
21:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
23:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
25:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
27:           x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½   ok
30:           x⁵ - ½x⁴ + 101¼x³ + 100¾x + 2½   ok
32:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
34:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
36:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok
38:                x⁵ - 2x⁴ + 42x³ + 40x + 1   ok