Twitter data encoding and fonts



Using streaming API to get social data and presenting it to UI after some analytical operations. Having hard time in correctly displaying twitter data on UI. Some of the characters like emoticons, curly quotes and other multibyte chars gets messed up and become either question mark or Junk.
What I tried is changing encoding throughout to UTF-8 (No luck)
Converting emoticons to equivalent HTML Hex but still some of the emoticons are left (No luck)

Please show me some direction as I’ve tried lots of things.

Technologies Involved: JAVA, UNIX, KAFKA, HBASE, SOLR [Data passes through all these tech in this sequence]


For dealing with emoji, have a look at that will definitely keep things consistent.

UTF-8 is the right thing to use - if it’s not working properly, something in the pipeline may still be mangling the encoding. Hard to tell where.

Is the UI a web interface of some sort? Because once you have a tweet id, this may work well: (the blockquote will be replaced with appropriate html rendering the tweet)