Bias vs. Variância (Parte 3)

Depois de tanto tempo, a terceira parte da série Bias vs. Variância saiu!

Apenas relembrando, a série foi dividida da seguinte forma:

  • Primeira parte: conceito de bias e variância
  • Segunda parte: cálculo do bias e da variância
  • Terceira parte: métodos de redução do bias e da variância

Em resumo, enquanto o bias está ligado à capacidade das predições do modelo se aproximarem dos valores reais, a variância está relacionada à consistência dos resultados do modelo em diferentes conjuntos de dados.

Apesar de sabermos calcular “explicitamente” o bias e a variância de um modelo, dependendo do tamanho do conjunto de dados (número de variáveis e samples) e da complexidade do modelo, o processo pode ser computacionalmente caro. Desta forma, precisamos ter outras maneiras de inferir se estamos com problemas de bias ou variância.

Leia Mais…

The Rise of Analytics 3.0

Excelente apresentação de Thomas H. Davenport, considerado um dos Data Scientists mais influentes do mundo.

The Rise of Analytics 3.0: How to Compete in the Data Economy

Nesta apresentação, ele divide Analytics em 3 períodos:

  1. Analytics 1.0 -> Tradicional
    • Basicamente análise descritiva e reporting
    • Dados internos e estruturados
    • Pouco contato entre analistas e área de negócios
    • Suporte à decisão interna
  2. Analytics 2.0 -> Big Data
    • Análise preditiva e prescritiva
    • Dados complexos, não estruturados e de diferentes fontes (internos e externos)
    • Novos recursos computacionais e analíticos (Machine Learning!)
    • Data Scientists
    • Produtos e serviços baseados em dados (empresas online)
  3. Analytics 3.0 -> Data Economy
    • Todas as decisões são baseadas ou influenciadas pelos dados
    • Rápida entrega de insights
    • Ferramentas de análise estão disponíveis para quem toma as decisões
    • Análise é incorporada dentro dos processos operacionais e de decisão
    • Todas as empresas podem criar produtos e serviços baseados em dados

MIT to offer its first professional MOOC in big data


The Massachusetts Institute of Technology has been involved in online education since the early days, and now it’s taking it a step further. Yesterday, the college announced its first online, professional-leaning Massively Open Online Course (MOOC), entitled “Tackling the Challenge of Big Data.”

Led by a dozen faculty from the university’s Computer Science and Artificial Intelligence Laboratory (CSAIL) at the School of Engineering, the four-week course starts at the beginning of March and is directed specifically at technical professionals and executives — not academic-types. The course is the first in a new set of courses offered by the university called Online X, which offers professional classes through the edX platform.

One important thing, though: these classes may be open, but they don’t come cheap. Participating in the course will run users $495 — far from the free price tags of many MOOCs available. But it’s likely that extra cost…

Ver o post original 144 mais palavras

Synference thinks A/B testing can get a lot smarter with machine learning


Say a web publisher wants to find out which banner ad is most appealing to which audience, or which price point will make a certain user more likely to buy. Normally it would use multivariate A/B testing — the process of showing different versions of the same screen or screen elements to users and  gathering data on their reactions — but the process is lengthy and testing numerous variables like location, time of day, or browser used spreads the data thin.

Synference, an API and company that is launching today, aims to solve this.

The Ireland-based operation uses A/B testing, machine learning and basic user data garnered from IP addresses and user agent. As the API receives user feedback — did she click on a banner or not? — Synference detects patterns of user behavior and updates its statistical model accordingly. It also allows companies to exploit this information before…

Ver o post original 187 mais palavras

Maybe big data is the killer app for Google’s cloud


Google’s Compute Engine cloud doesn’t yet have a Hadoop offering of its own, but the platform is making a name for itself as a viable, if not ideal, place to run big data workloads. The latest validation came on Thursday when Qubole, the Hadoop-as-a-service startup from Hive creators Ashish Thusoo and Joydeep Sen Sarma, announced an option that users can choose to run on Compute Engine, which they claim provides better performance than Amazon Web Services.

Specifically, a company spokesperson told me via email, Qubole has seen 2-3x faster startup times for virtual servers using Compute Engine over Amazon EC2 and more reliable performance from Google Cloud Storage than from Amazon S3. We’ll also assume that AWS is the “CloudX” against which Qubole engineer Praveen Seluka benchmarked Compute Engine, some results of which he shared on the Google Cloud Platform blog. Qubole did launch as an AWS-based service…

Ver o post original 306 mais palavras