Comments (27)
What version of the openHistorian are you using?
How is the data flowing into the openHistorian?
Is this PMU data?
Thanks,
Ritchie
from openhistorian.
@ritchiecarroll ,
Thank you Ritchie!
We are running openHistorian 2.0.415 on Windows Server 2012 with .Net framework 4.5 and openHistorian 2.1 release on Windows 10 with .NET framework 4.6. Both of them are suffering from missing data.
It is FDR's data using FNet protocol. We have a self-developed server which receives data from FDR directly, then forward data to openHistorian. The 'ConnectionString' of devices in openHistorian configuration database is 'transportprotocol=Tcp;......'. According to the action, all data is received by openHistorian.
What version of the openHistorian are you using?
How is the data flowing into the openHistorian?
Is this PMU data?
from openhistorian.
So you have developed a custom action adapter that runs inside the openHistorian that receives data from the FDRs? Or are you using the built-in FNET device protocol?
The only thing I can think of is that if the timestamps of the incoming data were duplicated - these points would be harder to extract since the historian "key" is based on ID, Timestamp then a counter.
from openhistorian.
@ritchiecarroll
A custom action adapter runs inside the openHistorian. The adapter receives data from openHistorian by implementing the function ConcentratorBase.PublishFrame(IFrame frame, int index)
.
So you have developed a custom action adapter that runs inside the openHistorian that receives data from the FDRs? Or are you using the built-in FNET device protocol?
from openhistorian.
@ritchiecarroll
In the custom action adapter, I write the timestamp and values count of every IFrame into a text file. I didn't find any duplicated timestamp.
The only thing I can think of is that if the timestamps of the incoming data were duplicated - these points would be harder to extract since the historian "key" is based on ID, Timestamp then a counter.
from openhistorian.
I am assuming the actual F-NET data comes in using a standard device connection using the built-in UTK FNET
protocol?
If so, the steps below will help us see if the outage is due to a re-connect - it could be that there is a delay in the source data stream and device simply reconnects, a standard operation when data stops flowing. By default this timeout is set 5 seconds which may be too short and the device is reconnecting. By the way, this setting is called "Data Loss Interval" and can be found on the openHistorian Manager device configuration screen.
From the openHistorian machine connected to the remote FNET streams, run the “Statistics Trending Tool” – as named in the start menu, actual EXE name is StatisticView.exe in the openHistorian installation folder. Once running, connect to the Statistics historian – note that application should already default to this archive when no other local archive is installed, but just in case, default statistics archive path is “C:\Program Files\openHistorian\Statistics\” – this assuming openHistorian was installed on C: drive.
Now, for the connected FNET device, find the statistic that ends with “!IS:ST8”, i.e., input stream statistic number 8 which defines a boolean value representing if the input stream was continually connected during last reporting interval. For statistics the default reporting interval is every 10 seconds, so if the stat is non-zero the device was continually connected over the 10 second period.
If you select this statistic and trend this value over the time window for the data gap you found, you will then know if the device connection was interrupted and hence caused the data gap.
If the device connection was interrupted, you can go through logs (using “Log File Viewer” application) and check the messages for the device around the same timeframe – from this you can get a clue if the connection was terminated by the remote device or reset because data stopped flowing, this often due to a change in network, e.g., router reboot, etc.
If this is not the case then there may be something else going on and I would suggest carefully monitoring the logs for errors or connection issues.
Thanks,
Ritchie
from openhistorian.
@ritchiecarroll
Thank you for your kindly help!
All the devices use built-in UTK FNet protocol.
One of the time windows of missing data is 02/08/2017 0:02:39 - 0:02:42. I picked four devices which have missing data, as shown below. It seems all of them were continually connected in the time window.
Actually, almost all devices (around 100) lost data in that time window, and we didn't find any connection status change. The status was updated at 0:02:26, 36, 46, 56... And the action adapter received all the data.
from openhistorian.
We may want you to install a newer build so that we can have enhanced logging on that we can evaluate when a gap occurs. Would this be possible? If so you can update existing version with a nightly build:
https://www.gridprotectionalliance.org/NightlyBuilds/openHistorian/Beta/openHistorian.Installs.zip
With the new version we have a new detailed logging system that will provide much more detail. Also, new archive log files are time-stamped so that we can find overlapping logs around time of detected data gap. After installation the log files will end up in the "C:\Program Files\openHistorian\Logs" folder.
Thanks,
Ritchie
from openhistorian.
Also - what about data quality? Are we sure received data is not NaN? Are you writing captured data from your custom adapter to a file that we could review?
Thanks,
Ritchie
from openhistorian.
@ritchiecarroll
That's great! Let me install this newer build to get detailed log information.
Also - what about data quality? Are we sure received data is not NaN? Are you writing captured data from your custom adapter to a file that we could review?
Right now the adapter didn't write data to files. Only the timestamp and value count are captured. Since we have another self-developed server receiving the same forwarded data and didn't have any problem, the data quality should be good. To make it 100% sure, I'll write all the data to files after installing the aforementioned build.
Thank you very much!
Wenpeng
from openhistorian.
@ritchiecarroll
With the new build, I got another time window of missing data [UTC 2/8/2017 23:18:20.5, 23:18:22.7]
Didn't find any connection status change.
The custom action adapter received all the data, and the data quality is good
The value counts (the last column) received by adapter didn't change significantly, around 80 to 88, in several hours.
It seems all data is received by openHistorian and the data quality is good. But something stopped openHistorian from archiving some of time frames.
from openhistorian.
OK, thanks. Let's get a copy of the log files around that time frame and see what interesting was going on.
from openhistorian.
@ritchiecarroll
Here is the local time (column 0) and UTC time (column 1)
This is the screenshot of LogfileViewer. Usually there are at lease 5 log messages in 10 seconds. But there isn't any log from 18:18:18 - 18:18:28. The log file is also attached.
This is the log file. Please change the extension back to .logz. I'm not allowed to upload logz or zip file...
20170208-225631-007026 - openHistorian - 1.pdf
Is there any other log files which may have useful information?
from openhistorian.
from openhistorian.
Looks like there is a pause, may be Garabage Collection related and we will want to validate GC settings in your config file (i.e., openHistorian.exe.config).
Regardless, the source data stream and device simply reconnects when no data is received during pause, again, a standard operation when data stops flowing. By default this timeout is set 5 seconds which must be too short and the device is reconnecting. This setting is called "Data Loss Interval" and can be found on the openHistorian Manager device configuration screen.
Suggest changing this to 15 seconds for each input device to see if data loss stops.
Thanks,
Ritchie
from openhistorian.
Added EnableThreadPoolMonitoring=;, and set 'Data loss interval' to 15 by modifying table 'Device' in configuration database directly. And openHistorian is restarted. Hope it could help.
To verify this is the root cause, turn on ThreadPoolMonitoring and restart openHistorian. You can do this by modifying your openhistorian.exe.config file. Find the line that says OptimizationsConnectionString and add EnableThreadPoolMonitoring=;
Suggest changing this to 15 seconds for each input device to see if data loss stops.
I still have some questions. According to the custom action adapter and statistic database, all the data is received by openHistorian and all devices are continually connected. And, if there is a pause, we should observe an obvious change in the time difference between time stamp (UTC) and the log time, which is the local time when the custom adapter received the time frame. But according to the log files the time difference is 2.3 seconds and didn't change.
from openhistorian.
I agree - that's what is strange about all this. Also, there are some very unusual messages in your error log. Perhaps the next best step is to have a WebEx and take a look?
Want to send me an e-mail about scheduling this?
Thanks,
Ritchie
from openhistorian.
@ritchiecarroll
I modified the configuration as you and Steven Chisholm suggested and openHistorian has run several hours. There are still missing data.
Thank you for taking time to help us! I'm flexible to have WebEx. What's your available time?
from openhistorian.
Tomorrow afternoon? Please send me an e-mail at my GPA e-mail address and we will arrange this...
from openhistorian.
@ritchiecarroll @StevenChisholm
I'm digging into the source code, and traced down to GSF.Snap.Collection.SortedPointBuffer<TKey,TValue>.TryEnqueue(Tkey,TValue>
for PrebufferWriter
It seems the SortedPointBuffer
received all the data. I need to go deeper to find where the data get lost..
from openhistorian.
@ritchiecarroll @StevenChisholm
It seems sometimes data may get lost when creating UnionTreeStream<TKey, TValue>
UnionTreeStream<TKey, TValue>
is used to merge PendingTable
to the next level or rollover pending tables from all levels to the NextStage archive. But sometimes the merging procedure stops unexpectedly and the remaining data gets lost. One example is shown below:
10 PendingTable2
are merged to PendingTable3
. There are 53021 values in the 10 PendingTable2
, but only 15674 of them are copied to PendingTable3
. This could also happen in the rollover from FirstStage to NextStage archive.
Still don't know what is the reason which stops the merging procedure.
from openhistorian.
@ritchiecarroll @StevenChisholm
Duplicated data may be the reason. But not sure yet..
from openhistorian.
@StevenChisholm
@ritchiecarroll
I think I got the bug. It is in UnionTreeStreamSortHelper.SortAssumingIncreased(int index)
, at line 116 as shown below.
.
x
should not start from 0 but index
. Once there are duplicated keys in BufferedTreeStream
s, and the duplicated key happens to be the last key of the second (or third and so on) BufferedTreeStream
, this sort function will be called and make mistake.
One of the captured examples is shown below, in which 2017-02-12 00:37:22.3000000/785
is the duplicated key.
There are 3 sections:
- The original order of
BufferedTreeStream
s and their cache key and valid flag before removing duplicated key. - Information of the streams after removing duplicated key by advancing the position of the second duplicated key.
- Information and order of streams after sorting the streams.
It caused 3 remaining streams ignored.
from openhistorian.
from openhistorian.
The problem was solved.
Sometimes meter may resend data due to network issue, this is why we get duplicated key.
Previously the data losing happened around 7 times every hour in average. Now there isn't any missing data in the last 15 hours.
Thank you for all your help! @ritchiecarroll @StevenChisholm
Best wishes,
Wenpeng Yu
from openhistorian.
from openhistorian.
I'll add my thank you as well - nice job @yuwenpeng! I'll close this for now given the discovered fix:
437ed9d
Thanks again!
Ritchie
from openhistorian.
Related Issues (20)
- API to query measurement details HOT 1
- Data missing on export data from openhistorian webmanager HOT 4
- ATMEGA2560 Modbus openHistorian display to Grafana HOT 8
- Full Documentation for OpenHistorian HOT 3
- Is there any mechanism to store Historian Data directly into any RDBMS like Oracle/SQL Server instead of Files HOT 3
- Update openHistorian Grafana Instance HOT 1
- Multiple indexes
- Service unexpected stop HOT 23
- Grafana and OH User Synchronization requires log-on HOT 2
- Trend/Export Measurements Timestamps HOT 1
- Adding New Device (hardware PMU) HOT 10
- Fix build tools to update version in SDK project files
- Reinstall under different service credentials does not overwrite HTTP reservations HOT 2
- How to disable openHistorian debug host automated start HOT 4
- lossless data compression HOT 4
- How to get openPDC data in openHistorian. HOT 2
- Error in Github Desktop when pulling Source/Libraries/PythonAPIModule' HOT 1
- How to connect openHistorian with other datasources
- Compression algorithm about the OpenHistoian HOT 2
- COMTRADE Export with fields that contain commas
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openhistorian.