For the India weekly dataset, we may select our target products based on previous stat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

India product selection about humanitas HOT 4 CLOSED

halccw commented on August 17, 2024

India product selection

from humanitas.

Comments (4)

mstefanro commented on August 17, 2024

@ChingChia

Cut-off rate 0.2 means picking only those for which at least 80% of the series is known prior to interpolation?

Important:
This may be confusing, but the "region" column in the daily and weekly datasets really means city, NOT region. To get the region, you need to join the datasets with the /data/india/csv_daily/agmarknet.nic.in/regions.csv file. I suggest
we replace the column name now to avoid future confusion.
Since our prediction model is per-region rather than per-city, maybe you should base your stats on per-region instead. When you are saying "35-40 cities" it is not very informative, because they may all be from the same region. And we are going to merge them using PCA (or averages?) in the end so we would really have one city, if it is indeed the case that they are all in the same region.

Besides, it seems that we do not have to worry about the subproduct dimension except for rice.

Our most important data-set is the daily one, not the weekly one. On the daily one, rice has 100 subproducts, onion has 26 subproducts, wheat has 68 subproducts etc. So we do have to worry about both city and subproducts.
What we would like to do is the following:

let D be a mapping from all (R, P) to a time series
for each region R:
|    for each product P:
|    |   let M be a matrix.
|    |   for each subproduct SP (of product P):
|    |   |   for each city C (of region R):
|    |   |   |   let T be the time-series corresponding to (R,P,SP,C)
|    |   |   |   interpolate T to obtain a full time-series
|    |   |   |   add the vector T as a column to matrix M
|    |   let T = PCA(M, 1)
|    |   store a mapping from (R, P) to T into D

Try to make your code in such a way that it works both on the daily and weekly datasets. The only differences between the datasets are the date-range you have to pick and the gaps between dates for interpolation (1 week vs. 1 day).
I can provide help with implementing this after we meet. We first need to go over your code.

One extra difficulty for the weekly dataset is that you might have to account for prices reported on the same week, but different day (I don't know if this occurs in the data, you should check). If that is the case, then you should really interpolate on week-of-the-year index rather than date index.

from humanitas.

halccw commented on August 17, 2024

@mstefanro

Yes, 0.2 cutoff rate means choosing those series with at least 80% non-NaN data points before interpolation.

We can easily group series in the same region by looping region[0] = [city1, city2...]. I will add stats on region tmr.

The final point you mentioned is fine. Prices are always reported on Fridays.

in: all_dates_raw = sorted(list(set(df['date'])))
in: all_dates = pd.date_range(all_dates_raw[0], all_dates_raw[-1], freq='W-FRI')
in: list(set(all_dates) - set(all_dates_raw))

out: 
[Timestamp('2007-03-02 00:00:00', tz=None),
 Timestamp('2007-03-09 00:00:00', tz=None),
 Timestamp('2007-03-16 00:00:00', tz=None),
 Timestamp('2007-03-23 00:00:00', tz=None),
 Timestamp('2007-03-30 00:00:00', tz=None),
 Timestamp('2007-04-06 00:00:00', tz=None),
 Timestamp('2007-04-13 00:00:00', tz=None),
 Timestamp('2007-04-20 00:00:00', tz=None),
 Timestamp('2007-04-27 00:00:00', tz=None),
 Timestamp('2007-05-04 00:00:00', tz=None)]

from humanitas.

mstefanro commented on August 17, 2024

Thanks for the feedback.
I don't think you don't have to redo the statistics, I merely wanted to
let you know that in the end we're going
to need to have at least one city in each region of interest.

On 04/23/2014 12:17 AM, chingchia wrote:

@mstefanro https://github.com/mstefanro

Yes, 0.2 cutoff rate means choosing those series with at least 80%
non-NaN data points before interpolation.

We can easily group series in the same region by looping region[0] =
[city1, city2...]. I will add stats on region tmr.

The final point you mentioned is fine. Prices are always reported on
Fridays.

in: all_dates_raw = sorted(list(set(df['date'])))
in: all_dates = pd.date_range(all_dates_raw[0], all_dates_raw[-1], freq='W-FRI')
in: list(set(all_dates) - set(all_dates_raw))

out:
[Timestamp('2007-03-02 00:00:00', tz=None),
Timestamp('2007-03-09 00:00:00', tz=None),
Timestamp('2007-03-16 00:00:00', tz=None),
Timestamp('2007-03-23 00:00:00', tz=None),
Timestamp('2007-03-30 00:00:00', tz=None),
Timestamp('2007-04-06 00:00:00', tz=None),
Timestamp('2007-04-13 00:00:00', tz=None),
Timestamp('2007-04-20 00:00:00', tz=None),
Timestamp('2007-04-27 00:00:00', tz=None),
Timestamp('2007-05-04 00:00:00', tz=None)]

—
Reply to this email directly or view it on GitHub
#18 (comment).

from humanitas.

f4bD3v commented on August 17, 2024

Among the series with acceptable cutoff rate, we should select those for important commodities

"Rice is the staple of the south, while bread => wheat is the staple of the north, of course with some cross over. Environmental conditions support this trend; with the largest rice growing in the south and wheat grown mainly in the north. Dal, which is Hindi for lentil, is eaten all over."

"Common vegetables used in cooking; potato, onion, okra, green beans, peas, cauliflower, capsicum, carrot (which are red), mushrooms, eggplant, chilli."

"Available fruits include apples, oranges, mandarins (which they call oranges), bananas, mango and pineapple."

source: http://www.thetravelalmanac.com/india/indian-food.htm

In this pdf Groundnut Oil and Peanut Oil are said to be the most used oils in India:
http://www.umbrellaindia.com/Different-types-oils.pdf

from humanitas.

India product selection about humanitas HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent