<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Contextual Bandits and Reinforcement Learning</title>
	<atom:link href="http://pavel.surmenok.com/2017/08/26/contextual-bandits-and-reinforcement-learning/feed/" rel="self" type="application/rss+xml" />
	<link>http://pavel.surmenok.com/2017/08/26/contextual-bandits-and-reinforcement-learning/</link>
	<description></description>
	<lastBuildDate>Sat, 13 Apr 2019 17:14:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: Parth Radia</title>
		<link>http://pavel.surmenok.com/2017/08/26/contextual-bandits-and-reinforcement-learning/#comment-143</link>
		<dc:creator>Parth Radia</dc:creator>
		<pubDate>Thu, 16 Nov 2017 16:44:00 +0000</pubDate>
		<guid isPermaLink="false">http://pavel.surmenok.com/?p=271#comment-143</guid>
		<description><![CDATA[Ah, understood. I guess my next question is -- if one has no reliable way to build a &quot;gym&quot; (simulation) for reinforcement learning, are contextual bandits the best initial approach?

The reason I am asking is because I have a domain where:
- There is no initial data (hence the bandits instead of collab. filt.)
- There is a way to collect context (hence the contextual bandits)
- There is no way to build simulations necessary for reinforcement learning.]]></description>
		<content:encoded><![CDATA[<p>Ah, understood. I guess my next question is &#8212; if one has no reliable way to build a &#8220;gym&#8221; (simulation) for reinforcement learning, are contextual bandits the best initial approach?</p>
<p>The reason I am asking is because I have a domain where:<br />
- There is no initial data (hence the bandits instead of collab. filt.)<br />
- There is a way to collect context (hence the contextual bandits)<br />
- There is no way to build simulations necessary for reinforcement learning.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: surmenok</title>
		<link>http://pavel.surmenok.com/2017/08/26/contextual-bandits-and-reinforcement-learning/#comment-142</link>
		<dc:creator>surmenok</dc:creator>
		<pubDate>Thu, 16 Nov 2017 04:59:00 +0000</pubDate>
		<guid isPermaLink="false">http://pavel.surmenok.com/?p=271#comment-142</guid>
		<description><![CDATA[Reinforcement learning models learn how to perform multiple actions. For example, in the game of chess, there can be a lot of moves before the outcome (win/draw/defeat) is observed.
Contextual bandits are a subset of reinforcement learning algorithms which are simpler: there is only one step before the outcome is observed. For example, you make one decision to select which link to show on a web page, and you get an outcome (and associated reward) after that: whether the user clicked on the link. In this sense contextual bandit is just a reinforcement learning algorithm reduced to one step.]]></description>
		<content:encoded><![CDATA[<p>Reinforcement learning models learn how to perform multiple actions. For example, in the game of chess, there can be a lot of moves before the outcome (win/draw/defeat) is observed.<br />
Contextual bandits are a subset of reinforcement learning algorithms which are simpler: there is only one step before the outcome is observed. For example, you make one decision to select which link to show on a web page, and you get an outcome (and associated reward) after that: whether the user clicked on the link. In this sense contextual bandit is just a reinforcement learning algorithm reduced to one step.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parth Radia</title>
		<link>http://pavel.surmenok.com/2017/08/26/contextual-bandits-and-reinforcement-learning/#comment-141</link>
		<dc:creator>Parth Radia</dc:creator>
		<pubDate>Thu, 16 Nov 2017 03:12:00 +0000</pubDate>
		<guid isPermaLink="false">http://pavel.surmenok.com/?p=271#comment-141</guid>
		<description><![CDATA[Pavel, thanks for this. Bandit methods are pretty obscure and hard to learn about in comparison to collaborative filtering techniques.

The piece I&#039;m not understanding is the delineation between contextual bandits and reinforcement learning.

Specifically: &quot;If you get reinforcement learning algorithm with policy gradients and simplify it to a contextual bandit by reducing a number of steps to one, the model will be very similar to a supervised classification model.&quot;

 Could you expand a bit more on this?]]></description>
		<content:encoded><![CDATA[<p>Pavel, thanks for this. Bandit methods are pretty obscure and hard to learn about in comparison to collaborative filtering techniques.</p>
<p>The piece I&#8217;m not understanding is the delineation between contextual bandits and reinforcement learning.</p>
<p>Specifically: &#8220;If you get reinforcement learning algorithm with policy gradients and simplify it to a contextual bandit by reducing a number of steps to one, the model will be very similar to a supervised classification model.&#8221;</p>
<p> Could you expand a bit more on this?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
